3768 stories
·
3 followers

The State of SSL Stacks

1 Share

A paper on this topic was prepared for internal use within HAProxy last year, and this version is now being shared publicly. Given the critical role of SSL in securing internet communication and the challenges presented by evolving SSL technologies, reverse proxies like HAProxy must continuously adapt their SSL strategies to maintain performance and compatibility, ensuring a secure and efficient experience for users. We are committed to providing ongoing updates on these developments.

The SSL landscape has shifted dramatically in the past few years, introducing performance bottlenecks and compatibility challenges for developers. Once a reliable foundation, OpenSSL's evolution has prompted a critical reassessment of SSL strategies across the industry.

For years, OpenSSL maintained its position as the de facto standard SSL library, offering long-term stability and consistent performance. The arrival of version 3.0 in September 2021 changed everything. While designed to enhance security and modularity, the new architecture introduced significant performance regressions in multi-threaded environments, and deprecated essential APIs that many external projects relied upon. The absence of the anticipated QUIC API further complicated matters for developers who had invested in its implementation.

This transition posed a challenge for the entire ecosystem. OpenSSL 3.0 was designated as the Long-Term Support (LTS) version, while maintenance for the widely used 1.1.1 branch was discontinued. As a result, many Linux distributions had no practical choice but to adopt the new version despite its limitations. Users with performance-critical applications found themselves at a crossroads: continue with increasingly unsupported earlier versions or accept substantial penalties in performance and functionality.

Performance testing reveals the stark reality: in some multi-threaded configurations, OpenSSL 3.0 performs significantly worse than alternative SSL libraries, forcing organizations to provision more hardware just to maintain existing throughput. This raises important questions about performance, energy efficiency, and operational costs.

Examining alternatives—BoringSSL, LibreSSL, WolfSSL, and AWS-LC—reveals a landscape of trade-offs. Each offers different approaches to API compatibility, performance optimization, and QUIC support. For developers navigating the modern SSL ecosystem, understanding these trade-offs is crucial for optimizing performance, maintaining compatibility, and future-proofing their infrastructure.

# Functional requirements

The functional aspects of SSL libraries determine their versatility and applicability across different software products. HAProxy’s SSL feature set was designed around the OpenSSL API, so compatibility or functionality parity is a key requirement. 

  • Modern implementations must support a range of TLS protocol versions (from legacy TLS 1.0 to current TLS 1.3) to accommodate diverse client requirements while encouraging migration to more secure protocols. 

  • Support for innovative, emerging protocols like QUIC plays a vital role in driving widespread adoption and technological breakthroughs. 

  • Certificate management functionality, including chain validation, revocation checking via OCSP and CRLs, and SNI (Server Name Indication) support, is essential for proper deployment. 

  • SSL libraries must offer comprehensive cipher suite options to meet varying security policies and compliance requirements such as PCI-DSS, HIPAA, and FIPS. 

  • Standard features like ALPN (Application-Layer Protocol Negotiation) for HTTP/2 support, certificate transparency validation, and stapling capabilities further expand functional requirements. 

Software products relying on these libraries must carefully evaluate which functional components are critical for their specific use cases while considering the overhead these features may introduce.

# Performance considerations

SSL/TLS operations are computationally intensive, creating significant performance challenges for software products that rely on these libraries. Handshake operations, which establish secure connections, require asymmetric cryptography that can consume substantial CPU resources, especially in high-volume environments. They also present environmental and logistical challenges alongside their computational demands. 

The energy consumption of cryptographic operations directly impacts the carbon footprint of digital infrastructure relying on these security protocols. High-volume SSL handshakes and encryption workloads increase power requirements in data centers, contributing to greater electricity consumption and associated carbon emissions. 

Performance of SSL libraries has become increasingly important as organizations pursue sustainability goals and green computing initiatives. Modern software products implement sophisticated core-awareness strategies that maximize single-node efficiency by distributing cryptographic workloads across all available CPU cores. This approach to processor saturation enables organizations to fully utilize existing hardware before scaling horizontally, significantly reducing both capital expenditure and energy consumption that would otherwise be required for additional servers. 

By efficiently leveraging all available cores for SSL/TLS operations, a single properly configured node can often handle the same encrypted traffic volume as multiple poorly optimized servers, dramatically reducing datacenter footprint, cooling requirements, and power consumption. 

These architectural improvements, when properly leveraged by SSL libraries, can deliver substantial performance improvements with minimal environmental impact—a critical consideration as encrypted traffic continues to grow exponentially across global networks.

# Maintenance requirements

The maintenance burden of SSL implementations presents significant challenges for software products. Security vulnerabilities in SSL libraries require immediate attention, forcing development teams to establish robust patching processes. 

Software products must balance the stability of established SSL libraries against the security improvements of newer versions; this process becomes more manageable when operating system vendors provide consistent and timely updates. Documentation and expertise requirements add further complexity, as configuring SSL properly demands specialized knowledge that may be scarce within development teams. Backward compatibility concerns often complicate maintenance, as updates must protect existing functionality while implementing necessary security improvements or fixes. 

The complexity and risks associated with migrating to a new SSL library version often encourage product vendors to try to stick as long as possible to the same maintenance branch, preferably an LTS version provided by the operating system’s vendor. 

# Current SSL library ecosystem

# OpenSSL

OpenSSL has served as the industry-standard SSL library included in most operating systems for many years. A key benefit has been its simultaneous support for multiple versions over extended periods, enabling users to carefully schedule upgrades, adapt their code to accommodate new versions, and thoroughly test them before implementation.

The introduction of OpenSSL 3.0 in September 2021 posed significant challenges to the stability of the SSL ecosystem, threatening its continued reliability and sustainability.

  1. This version was released nearly a year behind schedule, thus shortening the available timeframe for migrating applications to the new version. 

  2. The migration process was challenging due to OpenSSL's API changes, such as the deprecation of many commonly used functions and the ENGINE API that external projects relied on. This affected solutions like the pkcs11 engine used for Hardware Security Modules (HSM) and Intel’s QAT engine for hardware crypto acceleration, forcing engines to be rewritten with the new providers API. 

  3. Performance was also measurably lower in multi-threaded environments, making OpenSSL 3.0 unusable in many performance-dependent use cases. 

  4. OpenSSL also decided that the long-awaited QUIC API would finally not be merged, dealing a significant blow to innovators and early adopters of this technology. Developers and organizations were left without the key QUIC capabilities they had been counting on for their projects.

  5. OpenSSL labeled version 3.0 as an LTS branch and shortly thereafter discontinued maintenance of the previous 1.1.1 LTS branch. This decision left many Linux distributions with no viable alternatives, compelling them to adopt the new version.

Users with performance-critical requirements faced limited options: either remain on older distributions that still maintained their own version 1.1.1 implementations, deploy more servers to compensate for the performance loss, or purchase expensive extended premium support contracts and maintain their own packages.

# BoringSSL

BoringSSL is a fork of OpenSSL that was announced in 2014, after the heartbleed CVE. This library was initially meant for Google; projects that use it must follow the "live at HEAD" model. This can lead to maintenance challenges, since the API breaks frequently and no maintenance branches are provided.

However, it stands out in the SSL ecosystem for its willingness to implement bleeding-edge features. For example, it was the first OpenSSL-based library to implement the QUIC API, which other such libraries later adopted.

This library has been supported in the HAProxy community for some time now and has provided the opportunity to progress on the QUIC subject. While it was later abandoned because of its incompatibility with the HAProxy LTS model, we continue to keep an eye on it because it often produces valuable innovations.

# LibreSSL

LibreSSL is a fork of OpenSSL 1.0.1 that also emerged after the heartbleed vulnerability, with the aim to be a more secure alternative to OpenSSL. It started with a massive cleanup of the OpenSSL code, removing a lot of legacy and infrequently used code in the OpenSSL API.

LibreSSL later provided the libtls API, a completely new API designed as a simpler and more secure alternative to the libssl API. However, since it's an entirely different API, applications require significant modifications to adopt it.

LibreSSL aims for a more secure SSL and tends to be less performant than other libraries. As such, features considered potentially insecure are not implemented, for example, 0-RTT. Nowadays, the project focuses on evolving its libssl API with some inspiration from BoringSSL; for example, the EVP_AEAD and QUIC APIs.

LibreSSL was ported to other operating systems in the form of the libressl-portable project. Unfortunately, it is rarely packaged in Linux distributions, and is typically used in BSD environments.

HAProxy does support LibreSSL—it is currently built and tested by our continuous integration (CI) pipeline—however, not all features are supported. LibreSSL implemented the BoringSSL QUIC API in 2022, and the HAProxy team successfully ported HAProxy to it with libressl 3.6.0. Unfortunately, LibreSSL does not implement all the API features needed to use HAProxy to its full potential. 

# WolfSSL

WolfSSL is a TLS library which initially targeted the embedded world. This stack is not a fork of OpenSSL but offers a compatibility layer, making it simpler to port applications.

Back in 2012, we tested its predecessor, cyaSSL. It had relatively good performance but lacked too many features to be considered for use. Since that time, the library has evolved with the addition of many consequential features (TLS 1.3, QUIC, etc.) while still keeping its lightweight approach and even providing a FIPS-certified cryptographic module. 

In 2022, we started a port of HAProxy to WolfSSL with the help of the WolfSSL team. There were bugs and missing features in the OpenSSL compatibility layer, but as of WolfSSL 5.6.6, it became a viable option for simple setups or embedded systems. It was successfully ported to the HAProxy CI and, as such, is regularly built and tested with up-to-date WolfSSL versions.

Since WolfSSL is not OpenSSL-based at all, some behavior could change, and not all features are supported. HAProxy SSL features were designed around the OpenSSL API; this was the first port of HAProxy to an SSL library not based on the OpenSSL API, which makes it difficult to perfectly map existing features. As a result, some features occasionally require minor configuration adaptations.

We've been working with the WolfSSL team to ensure their library can be seamlessly integrated with HAProxy in mainstream Linux distributions, though this integration is still under development (https://github.com/wolfSSL/wolfssl/issues/6834).

WolfSSL is available in Ubuntu and Debian, but unfortunately, specific build options that are needed for HAProxy and CPU optimization are not activated by default. As a result, it needs to be installed and maintained manually, which can be bothersome.

# AWS-LC

AWS-LC is a BoringSSL (and by extension OpenSSL) fork that started in 2019. It is intended for AWS and its customers. AWS-LC targets security and performance (particularly on AWS hardware). Unlike BoringSSL, it aims for a backward-compatible API, making it easy to maintain.

We were recently approached by the AWS team, who provided us with patches to make HAProxy compatible with AWS-LC, enabling us to test them together regularly via CI. Since HAProxy was ported to BoringSSL in the past, we inherited a lot of features that were already working with it.

AWS-LC supports modern TLS features and QUIC. In HAProxy, it supports the same features as OpenSSL 1.1.1, but it lacks some older ciphers which are not used anymore (CCM, DHE). It also lacks the engine support that was already removed in BoringSSL.

It does provide a FIPS-certified cryptographic module, which is periodically submitted for FIPS validation.

# Other libraries

Mbedtls, GnuTLS, and other libraries have also been considered; however, they would require extensive rewriting of the HAProxy SSL code. We didn't port HAProxy to these libraries because the available feature sets did not justify the amount of up-front work and maintenance effort required.

We also tested Rustls and its rustls-openssl-compat layer. Rustls could be an interesting library in the future, but the OpenSSL compatibility application binary interface (ABI) was not complete enough to make it work correctly with HAProxy in its current state. Using the native Rustls API would again require extensive rewriting of HAProxy code.

We also routinely used QuicTLS (openssl+quic) during our QUIC development. However, it does not diverge enough from OpenSSL to be considered a different library, as it is really distributed as a patchset applied on top of OpenSSL.

# An introduction to QUIC and how it relates to SSL libraries

QUIC is an encrypted, multiplexed transport protocol that is mainly used to transport HTTP/3. It combines some of the benefits of TCP, TLS, and HTTP/2, without many of their drawbacks. It started as research work at Google in 2012 and was deployed at scale in combination with the Chrome browser in 2014. In 2015, the IETF QUIC working group was created to standardize the protocol, and published the first draft (draft-ietf-quic-transport-00) on Nov 28th, 2016. In 2020, the new IETF QUIC protocol differed quite a bit from the original one and started to be widely adopted by browsers and some large hosting providers. Finally, the protocol was published as RFC9000 in 2021.

One of the key goals of the protocol is to move the congestion control to userland so that application developers can experiment with new algorithms, without having to wait for operating systems to implement and deploy them. It integrates cryptography at its heart, contrary to classical TLS, which is only an additional layer on top of TCP.

A full-stack web application relies on these key components:

  • HTTP/1, HTTP/2, HTTP/3 implementations (in-house or libraries)

  • A QUIC implementation (in-house or library)

  • A TLS library shared between these 3 protocol implementations

  • The rest (below) is the regular UDP/TCP kernel sockets

Overall, this integrates pretty well, and various QUIC implementations started very early, in order to validate some of the new protocol’s concepts and provide feedback to help them evolve. Some implementations are specific to a single project, such as HAProxy’s QUIC implementation, while others, such as ngtcp2, are made to be portable and easy to adopt by common applications.

During all this work, the need for new TLS APIs was identified in order to permit a QUIC implementation to access some essential elements conveyed in TLS records, and the required changes were introduced in BoringSSL (Google’s fork of OpenSSL). This has been the only TLS library usable by QUIC implementations for both clients and servers for a long time. One of the difficulties with working with BoringSSL is that it evolves quickly and is not necessarily suitable for products maintained for a long period of time, because new versions regularly break the build, due to changes in BoringSSL's public API.

In February 2020, Todd Short opened a pull request (PR) on OpenSSL’s GitHub repository to propose a BoringSSL-compatible implementation of the QUIC API in OpenSSL. The additional code adds a few callbacks at some key points, allowing existing QUIC implementations such as MsQuic, ngtcp2, HAProxy, and others to support OpenSSL in addition to BoringSSL. It was extremely well-received by the community. However, the OpenSSL team preferred to keep that work on hold until OpenSSL 3.0 was released; they did not reconsider this choice later, even though the schedule was drifting. During this time, developers from Akamai and Microsoft created QuicTLS. This new project essentially took the latest stable versions of OpenSSL and applied the patchset on top of it. QuicTLS soon became the de facto standard TLS library for QUIC implementations that were patiently waiting for OpenSSL 3.0 to be released and for this PR to get merged.

Finally, three years later, the OpenSSL team announced that they were not going to integrate that work and instead would create a whole new QUIC implementation from scratch. This was not what users needed or asked for and threw away years of proven work from the QUIC community. This shocking move provoked a strong reaction from the community, who had invested a lot of effort in OpenSSL via QuicTLS, but were left to find another solution: either the fast-moving BoringSSL or a more officially maintained variant of QuicTLS. 

In parallel, other libs including WolfSSL, LibreSSL, and AWS-LC adopted the de facto standard BoringSSL QUIC API. 

Finally, OpenSSL continues to mention QUIC in their projects, though their current focus seems to be to deliver a single-stream-capable minimum viable product (MVP) that should be sufficient for the command-line "s_client" tool. However, this approach still doesn’t offer the API that QUIC implementations have been waiting for over the last four years, forcing them to turn to QuicTLS. 

The development of a transport layer like QUIC requires a totally different skillset than cryptographic library development. Such development work must be done with full transparency. The development team has degraded their project’s quality, failed to address ongoing issues, and consistently dismissed widespread community requests for even minor improvements. Validating these concerns, Curl contributor Stefan Eissing recently tried to make use of OpenSSL’s QUIC implementation with Curl and published his findings.They’re clearly not appealing, as most developers concerned about this topic would have expected.

In despair at this situation, we at HAProxy tried to figure out from the QUIC patch set if there could be a way to hack around OpenSSL without patching it, and we were clearly not alone. Roman Arutyunyan from NGINX core team were the first to propose a solution with a clever method that abuses the keylog callback to make it possible to extract or inject the required elements, and finally make it possible to have a minimal server-mode QUIC support. We adopted it as well, so users could start to familiarize themselves with QUIC and its impacts on their infrastructure, even though it does have some technical limitations (e.g., 0-RTT is not supported). This solution is only for servers, however; this hack may not work for clients (though this works for HAProxy, since QUIC is only implemented at the frontend at the moment).

With all that in mind, the possible choices of TLS libraries for QUIC implementations in projects designed around OpenSSL are currently quite limited:

  • QuicTLS: closest to OpenSSL, the most likely to work well as a replacement for OpenSSL, but now suffers from OpenSSL 3+ unsolved technical problems (more on that below), since QuicTLS is rebased on top of OpenSSL

  • AWS-LC: fairly complete, maintained, frequent releases, pretty fast, but no dedicated LTS branch for now

  • WolfSSL: less complete, more adaptable, very fast, also offers support contracts, so LTS is probably negotiable

  • LibreSSL: comes with OpenBSD by default, lacks some features and optimisations compared to OpenSSL, but works out of the box for small sites

  • NGINX’s hack: servers only, works out of the box with OpenSSL (no TLS rebuild needed), but has a few limitations, and will also suffer from OpenSSL 3+ unsolved technical problems

  • BoringSSL: where it all comes from, but moves too fast for many projects

This unfortunate situation considerably hurts QUIC protocol adoption. It even makes it difficult to develop or build test tools to monitor a QUIC server. From an industry perspective, it looks like either WolfSSL or AWS-LC needs to offer LTS versions of their products to potentially move into a market-leading position. This would potentially obsolete OpenSSL and eliminate the need for the QuicTLS effort.

# Performance issues

In SSL, performance is the most critical aspect. There are indeed very expensive operations performed at the beginning of a connection before the communication can happen. If connections are closed fast (service reloads, scale up/down, switch-over, peak connection hours, attacks, etc.), it is very easy for a server to be overwhelmed and stop responding, which in turn can make visitors try again and add even more traffic. This explains why SSL frontend gateways tend to be very powerful systems with lots of CPU cores that are able to handle traffic surges without degrading service quality.

During performance testing performed in collaboration with Intel, which led to optimizations reflected in this document, we encountered an unexpected bottleneck. We found ourselves stuck with the “h1load” generator unable to produce more than 400 connections per second on a 48-core machine. After extensive troubleshooting, traces showed that threads were waiting for each other inside the libcrypto component (part of the OpenSSL library). The load generators were set up on Ubuntu 22.04, which comes with OpenSSL 3.0.2. Rebuilding OpenSSL 1.1.1 and linking against it instantly solved the problem, unlocking 140,000 connections per second. Several team members involved in the tests got trapped linking tools against OpenSSL 3.0, eventually realizing that this version was fundamentally unsuitable for client-based performance testing purposes.

The performance problems we encountered were part of a much broader pattern. Numerous users reported performance degradation with OpenSSL 3; there is even a meta-issue created to try to centralize information about this massive performance regression that affects many areas of the library (https://github.com/OpenSSL/OpenSSL/issues/17627). Among them, there were reports about nodejs’ performance being divided by seven when used as a client, other tools showing a 20x processing time increase, a 30x CPU increase on threaded applications that was similar to the load generator problem, and numerous others.

Despite the huge frustration caused by the QUIC API rejection, we were still eager to try to help OpenSSL spot and address the massive performance regression. We’ve participated with others to try to explain to the OpenSSL team the root cause of the problem, providing detailed measurements, graphs, and lock counts, such as here. OpenSSL responded by saying “we’re not going to reimplement locking callbacks because embedded systems are no longer the target” (when speaking about an Intel Xeon with 32GB RAM), and even suggested that pull requests fixing the problems are welcome, as if it was trivial for a third party to fix the issues that had caused the performance degradation.

The disconnect between user experience and developer perspective was highlighted in recent discussions, further exemplified by the complete absence of a culture of performance testing. This lack of performance testing was glaringly evident when a developer, after asking users to test their patches, admitted to not conducting testing themselves due to a lack of hardware. It was then suggested that the project should just publicly call for hardware access (and this was apparently resolved within a week or two), and by this time, the performance testing of proposed patches was finally conducted by participants outside of the project, namely from Akamai, HAProxy, and Microsoft.

When some of the project members considered a 32% performance regression “pretty near” the original performance, it signaled to our development team that any meaningful improvement was unlikely. The lack of hardware for testing indicates that the project is unwilling or unable to direct sufficient resources to address the problems, and the only meaningful metric probably is the number of open issues. Nowadays, projects using OpenSSL are starting to lose faith and are adding options to link against alternative libraries, since the situation has stagnated over the last three years – a trend that aligns with our own experience and observations.

# Deep dive into the exact problem

Prior to OpenSSL 1.1.0, OpenSSL relied on a simple and efficient locking API. Applications using threads would simply initialize the OpenSSL API and pass a few pointers to the functions to be used for locking and unlocking. This had the merit of being compatible with whatever threading model an application uses. With OpenSSL 1.1.0, this function is ignored, and OpenSSL exclusively relies on the locks offered by the standard Pthread library, which can already be significantly heavier than what an application used to rely on.

At that time, while locks were implemented in many places, they were rarely used in exclusive mode, and not on the most common code paths. For example, we noticed heavy usage when using crypto engines, to the point of being the main bottleneck; quite a bit on session resume and cache access, but less on the rest of the code paths.

During our tests of the Intel QAT engine two years ago, we already noticed that OpenSSL 1.1.1 could make an immoderate use of locking in the engine API, causing extreme contention past 16 threads. This was tolerable, considering that engines were an edge case that was probably harder to test and optimize than the rest of the code. By seeing that these were just pthread_rwlocks and that we already had a lighter implementation of read-write locks, we had the idea to provide our own pthread_rwlock functions relying on our low-overhead locks (“lorw”), so that the OpenSSL library would use those instead of the legacy pthread_rwlocks. This proved extremely effective at pushing the contention point much higher. Thanks to this improvement, the code was eventually merged, and a build-time option was added to enable this alternate locking mechanism: USE_PTHREAD_EMULATION. We’ll see further that this option will be exploited again in order to measure what can be attributed to locking only.

With OpenSSL 3.0, an important goal was apparently to make the library much more dynamic, with a lot of previously constant elements (e.g., algorithm identifiers, etc.) becoming dynamic and having to be looked up in a list instead of being fixed at compile-time. Since the new design allows anyone to update that list at runtime, locks were placed everywhere when accessing the list to ensure consistency. These lists are apparently scanned to find very basic configuration elements, so this operation is performed a lot. In one of the measurements provided to the team and linked to above, it was shown that the number of read locks (non-exclusive) jumped 5x compared with OpenSSL 1.1.1 just for the server mode, which is the least affected one. The measurement couldn’t be done in client mode since it just didn’t work at all; timeouts and watchdog were hitting every few seconds.

As you’ll see below, just changing the locking mechanism reveals pretty visible performance gains, proving that locking abuse is the main cause of the performance degradation that affects OpenSSL 3.0.

OpenSSL 3.1 tried to partially address the problem by placing a few atomic operations instead of locks where it appeared possible. The problem remains that the architecture was probably designed to be way more dynamic than necessary, making it unfit for performance-critical workloads, and this was clearly visible in the performance reports of the issues above.

There are two remaining issues at the moment:

  • After everything imaginable was done, the performance of OpenSSL 3.x remains highly inferior to that of OpenSSL 1.1.1. The ratio is hard to predict, as it depends heavily on the workload, but losses from 10% to 99% were reported.

  • In a rush to get rid of OpenSSL 1.1.1, the OpenSSL team declared its end of life before 3.0 was released, then postponed the release of 3.0 by more than a year without adjusting 1.1.1’s end of life date. When 3.0 was finally emitted, 1.1.1 had little remaining time to live, so they had to declare 3.0 “long term supported”. This means that this shiny new version, with a completely new architecture that had not been sufficiently tested yet, would become the one provided by various operating systems for several years, since they all need multiple years of support. It turns out that this version proved to be dramatically worse in terms of performance and reliability than any other version ever released.

End users are facing a dead end:

  • Operating systems now ship with 3.0, which is literally unusable for certain users.

  • Distributions that were shipping 1.1.1 are progressively reaching end of support (except those providing extended support, but few people use these distributions, and they’re often paid).

  • OpenSSL 1.1.1 is no longer supported for free by the OpenSSL team, so many users cannot safely use it.

These issues sparked significant concern within the HAProxy community, fundamentally shifting their priorities. While they had initially been focused on forward-looking questions such as, "which library should we use to implement QUIC?", they were now forced to grapple with a more basic survival concern: "which SSL library will allow our websites to simply stay operational?" The performance problems were so severe that basic functionality, rather than new feature support, had become the primary consideration. 

# Performance testing results

HAProxy already supported alternative libraries, but the support was mostly incomplete due to API differences. The new performance problem described above forced us to speed up the full adoption of alternatives. At the moment, HAProxy supports multiple SSL libraries in addition to OpenSSL: QuicTLS, LibreSSL, WolfSSL, and AWS-LC. QuicTLS is not included in the testing since it is simply OpenSSL plus the QUIC patches, which do not impact performance. LibreSSL is not included in the tests because its focus is primarily on code correctness and auditability, and we already noticed some significant performance losses there - probably related to the removal of certain assembler implementations of algorithms and simplifications of certain features.

We included various versions of OpenSSL from 1.1.1 to the latest 3.4-dev (at the time), in order to measure the performance loss of 3.x compared with 1.1.1 and identify any progress made by the OpenSSL team to fix the regression. OpenSSL version 3.0.2 was specifically mentioned because it is shipped in Ubuntu 22.04, where most users face the problem after upgrading from Ubuntu 20.04, which ships the venerable OpenSSL 1.1.1. The HAProxy version used for testing was: HAProxy version 3.1-dev1-ad946a-33 2024/06/26

Testing scenarios:

  • Server-only mode with full TLS handshake: This is the most critical and common use for internet-facing web equipment (servers and load balancers), because it requires extremely expensive asymmetric cryptographic operations. The performance impact is especially concerning because it is the absolute worst case, and a new handshake can be imposed by the client at any time. For this reason, it is also often an easy target for denial of service attacks.

  • End-to-end encryption with TLS resumption: The resumption approach is the most common on the backend to reach the origin servers. Security is especially important in today’s virtualized environments, where network paths are unclear. Since we don’t want to inflict a high load on the server, TLS sessions are resumed on new TCP connections. We’re just doing the same on the frontend to match the common case for most sites.

Testing variants:

  • Two locking options (standard Pthread locking and HAProxy’s low-overhead locks)

  • Multiple SSL libraries and versions

Testing environment:

  • All tests will be running on AWS r8g.16xlarge instance, running 64 Graviton4 cores (ARM Neoverse V2)

# Server only mode with Full TLS Handshake

In this test, clients will:

  1. Connect to the server (HAProxy in this case)

  2. Perform a single HTTP request

  3. Close the connection

In this simplified scenario, to simulate the most ideal conditions, backend servers are not involved because they have a negligible impact, and HAProxy can directly respond to client requests. When they reconnect, they never try to resume an existing session, and instead always perform a new connection. Using RSA, this use case is very inexpensive for the clients and very expensive for the server. This use case represents a surge of new visitors (which causes a key exchange); for example, a site that suddenly becomes popular after an event (e.g., news sites). In such tests, a ratio of 1:10 to 1:15 in terms of performance between the client and the server is usually sufficient to saturate the server. Here, the server has 64 cores, but we’ll keep a 32-core client, which will be largely enough.

The performance of the machine running the different libraries is measured in number of new connections per second. It was always verified that the machine saturates its CPU. The first test is with the regular build of HAProxy against the libraries (i.e., HAProxy doesn’t emulate the pthread locks, but lets the libraries use them):

Two libraries stand out at the top and the bottom. At the top, above 63000 connections per second, in light blue, we’re seeing the latest version of AWS-LC (30 commits after v1.32.0), which includes important CPU-level optimizations for RSA calculations. Previous versions did not yield such results due to a mistake in the code that failed to properly detect the processor and enable the appropriate optimizations. The second fastest library, in orange, was WolfSSL 5.7.0. For a long time, we’ve known this library for being heavily optimized to run fast on modest hardware, so we’re not surprised and even pleased to see it in the top on such a powerful machine.

In the middle, around 48000 connections per second, or 25% lower, are OpenSSL 1.1.1 and the previous version of AWS-LC (~45k), version 1.29.0. Below those two, around 42500 connections per second, are the latest versions of OpenSSL (3.1, 3.2, 3.3 and 3.4-dev). At the bottom, around 21000 connections per second, are both OpenSSL 3.0.2 and 3.0.14, the latest 3.0 version at the time of testing.

What is particularly visible on this graph is that aside from the two versions that specifically optimize for this processor, all other libraries remained grouped until around 12-16 threads. After that point, the libraries start to diverge, with the two flavors of OpenSSL 3.0 staying at the bottom and reaching their maximum performance and plateau around 32 threads. Thus, this is not a cryptography optimization issue; it's a scalability issue.

When comparing the profiling output of OpenSSL 1.1.1 and 3.0.14 for this test, the difference is obvious.

OpenSSL 1.1.1w:

46.29% libcrypto.so.1.1 [.] __bn_sqr8x_mont
14.73% libcrypto.so.1.1 [.] __bn_mul4x_mont
13.01% libcrypto.so.1.1 [.] MOD_EXP_CTIME_COPY_FROM_PREBUF
2.05% libcrypto.so.1.1 [.] __ecp_nistz256_mul_mont
1.06% libcrypto.so.1.1 [.] sha512_block_armv8
0.95% libcrypto.so.1.1 [.] __ecp_nistz256_sqr_mont
0.61% libcrypto.so.1.1 [.] ecp_nistz256_point_double
0.54% libcrypto.so.1.1 [.] bn_mul_mont_fixed_top
0.52% [kernel] [k] default_idle_call
0.51% libc.so.6 [.] malloc
0.51% libc.so.6 [.] _int_free
0.50% libcrypto.so.1.1 [.] BN_mod_exp_mont_consttime
0.49% libcrypto.so.1.1 [.] ecp_nistz256_sqr_mont
0.46% libc.so.6 [.] _int_malloc
0.43% libcrypto.so.1.1 [.] OPENSSL_cleanse

OpenSSL 3.0.14:

19.12% libcrypto.so.3 [.] __bn_sqr8x_mont
17.33% libc.so.6 [.] __aarch64_ldadd4_acq
15.14% libc.so.6 [.] <a href="mailto:pthread_rwlock_unlock@@GLIBC_2.34">pthread_rwlock_unlock@@GLIBC_2.34</a>
12.48% libc.so.6 [.] <a href="mailto:pthread_rwlock_rdlock@@GLIBC_2.34">pthread_rwlock_rdlock@@GLIBC_2.34</a>
8.55% libc.so.6 [.] __aarch64_cas4_rel
6.04% libcrypto.so.3 [.] __bn_mul4x_mont
5.39% libcrypto.so.3 [.] MOD_EXP_CTIME_COPY_FROM_PREBUF
1.59% libcrypto.so.3 [.] __ecp_nistz256_mul_mont
0.80% libcrypto.so.3 [.] __aarch64_ldadd4_relax
0.74% libcrypto.so.3 [.] __ecp_nistz256_sqr_mont
0.53% libcrypto.so.3 [.] __aarch64_ldadd8_relax
0.50% libcrypto.so.3 [.] ecp_nistz256_point_double
0.43% libcrypto.so.3 [.] sha512_block_armv8
0.30% libcrypto.so.3 [.] ecp_nistz256_sqr_mont
0.24% libc.so.6 [.] malloc
0.23% libcrypto.so.3 [.] bn_mul_mont_fixed_top
0.23% libc.so.6 [.] _int_free

OpenSSL 3.0.14 spends 27% of the time acquiring and releasing read locks, something that should definitely not be needed during key exchange operations, to which we can add 26% in atomic operations, which is precisely 53% of the CPU spent doing non-useful things.

Let’s examine how much performance can be recovered by building with USE_PTHREAD_EMULATION=1. (The libraries will use HAProxy’s low-overhead locks instead of Pthread locks.)

The results show that the performance remains exactly the same for all libraries, except OpenSSL 3.0, which significantly increased to reach around 36000 connections per second. The profile now looks like this:

OpenSSL 3.0.14:

33.03% libcrypto.so.3 [.] __bn_sqr8x_mont
10.63% haproxy-openssl-3.0.14-emu [.] pthread_rwlock_wrlock
10.34% libcrypto.so.3 [.] __bn_mul4x_mont
9.27% libcrypto.so.3 [.] MOD_EXP_CTIME_COPY_FROM_PREBUF
5.63% haproxy-openssl-3.0.14-emu [.] pthread_rwlock_rdlock
3.15% haproxy-openssl-3.0.14-emu [.] pthread_rwlock_unlock
2.75% libcrypto.so.3 [.] __ecp_nistz256_mul_mont
2.19% libcrypto.so.3 [.] __aarch64_ldadd4_relax
1.26% libcrypto.so.3 [.] __ecp_nistz256_sqr_mont
1.10% libcrypto.so.3 [.] __aarch64_ldadd8_relax
0.87% libcrypto.so.3 [.] ecp_nistz256_point_double
0.72% libcrypto.so.3 [.] sha512_block_armv8
0.50% libcrypto.so.3 [.] ecp_nistz256_sqr_mont
0.42% libc.so.6 [.] malloc
0.41% libc.so.6 [.] _int_free

The locks used were the only difference between the two tests. The amount of time spent in locks noticeably diminished, but not enough to explain that big a difference. However, it’s worth noting that pthread_rwlock_wrlock made its appearance, as it wasn’t visible in the previous profile. It’s likely that, upon contention, the original function immediately went to sleep in the kernel, explaining why its waiting time was not accounted for (perf top measures CPU time).

# End-to-end encryption with TLS resumption

The next test concerns the most optimal case, that is, when the proxy has the ability to resume a TLS session from the client’s ticket, and then uses session resumption as well to connect to the backend server. In this mode, asymmetric cryptography is used only once per client and once per server for the time it takes to get a session ticket, and everything else happens using lighter cryptography.

This scenario represents the most common use case for applications hosted on public cloud infrastructures: clients connected all day to an application don't do it over the same TCP connection; connections are transparently closed when not used for a while, and reopened on activity, with the TLS session resumed. As a result, the cost of the initial asymmetric cryptography becomes negligible when amortized over numerous requests and connections. In addition, since this is a public cloud, encryption between the proxy and the backend servers is mandatory, so there’s really SSL on both sides.

Given that performance is going to be much higher, a single client and a single server are no longer sufficient for the benchmark. Thus, we’ll need 10 clients and 10 servers per proxy, each taking 10% of the total load, which gives the following theoretical setup:

We can simplify the configuration by having 10 distinct instances of the proxy within the same process (i.e., 10 ports, one per client -> server association):

Since the connections with the client and server are using the exact same protocols and behavior (http/1.1, close, resume), we can daisy-chain each instance to the next one and keep only client 1 and server 10:

With this setup, only a single client and a single server are needed, each seeing 10% of the load, with the proxy having to deal 10 times with these 10%, hence seeing 100% of the load.

The first test was run against the regular HAProxy version, keeping the default locks. The performance is measured in end-to-end connections per second; that is, one connection accepted from the client and one connection emitted to the server count together as one end-to-end connection.

Let’s ignore the highest two curves for now. The orange curve is again WolfSSL, showing an excellent linear scalability until 64 cores, where it reaches 150000 end-to-end connections per second, where the performance was only limited by the number of available CPU cores. This also demonstrates HAProxy’s modern scalability, showcasing that it can deliver linear performance scaling within a single process as the number of cores increases.

The brown curve below it is OpenSSL 1.1.1w; this used to scale quite well with rekeying, but when resuming and connecting to a server, the scalability disappears and performance degrades at 40 threads. Performance then collapses to the equivalent of 8 threads when reaching 64 threads, at 17800 connections per second. The performance profiling clearly reveals the cause: locking and atomics alone are wasting around 80% of the CPU cycles.

OpenSSL 1.1.1w:

72.83% libc.so.6 [.] <a href="mailto:pthread_rwlock_wrlock@@GLIBC_2.34">pthread_rwlock_wrlock@@GLIBC_2.34</a>
1.99% libc.so.6 [.] __aarch64_cas4_acq
1.47% libcrypto.so.1.1 [.] fe51_mul
1.30% libc.so.6 [.] __aarch64_cas4_relax
1.24% libcrypto.so.1.1 [.] fe_mul
1.00% libc.so.6 [.] __aarch64_ldset4_acq
0.86% libcrypto.so.1.1 [.] sha512_block_armv8
0.77% [kernel] [k] futex_q_lock
0.70% [kernel] [k] queued_spin_lock_slowpath
0.70% libcrypto.so.1.1 [.] fe51_sq
0.68% libcrypto.so.1.1 [.] x25519_scalar_mult
0.56% libc.so.6 [.] <a href="mailto:pthread_rwlock_unlock@@GLIBC_2.34">pthread_rwlock_unlock@@GLIBC_2.34</a>

The worst-performing libraries, the flat curves at the bottom, are once again OpenSSL 3.0.2 and 3.0.14, respectively. They both fail to scale past 2 threads; 3.0.2 even collapses at 16 threads, reaching performance levels that are indistinguishable from the X axis, and showing 1500-1600 connections per second at 16 threads and beyond, equivalent to just 1% of WolfSSL! OpenSSL 3.0.14 is marginally better, culminating at 3700 connections per second, or 2.5% of WolfSSL. In blunt terms: running OpenSSL 3.0.2 as shipped with Ubuntu 22.04 results in 1/100 of WolfSSL’s performance on identical hardware! To put this into perspective, you would have to deploy 100 times the number of machines to handle the same traffic, solely because of the underlying SSL library.

It’s also visible that a 32-core system running optimally at 63000 connections per second on OpenSSL 1.1.1 would collapse to only 1500 connections per second on OpenSSL 3.0.2, or 1/42 of its performance, for example, after upgrading from Ubuntu 20.04 to 22.04. This is exactly what many of our users are experiencing at the moment. It is also understandable that upgrading to the more recent Ubuntu 24.04 only addresses a tiny part of the problem, by only roughly doubling the performance with OpenSSL 3.0.14.

Here is a performance profile of the process running on OpenSSL 3.0.2:

14.52% [kernel] [k] default_idle_call
14.15% libc.so.6 [.] __aarch64_ldadd4_acq
9.87% libc.so.6 [.] <a href="mailto:pthread_rwlock_unlock@@GLIBC_2.34">pthread_rwlock_unlock@@GLIBC_2.34</a>
7.32% libcrypto.so.3 [.] ossl_sa_doall_arg
7.28% libc.so.6 [.] <a href="mailto:pthread_rwlock_rdlock@@GLIBC_2.34">pthread_rwlock_rdlock@@GLIBC_2.34</a>
6.23% [kernel] [k] arch_local_irq_enable
3.35% libcrypto.so.3 [.] __aarch64_ldadd8_relax
2.80% libc.so.6 [.] __aarch64_cas4_rel
2.04% [kernel] [k] arch_local_irq_restore
1.32% libcrypto.so.3 [.] OPENSSL_LH_doall_arg
1.11% libcrypto.so.3 [.] __aarch64_ldadd4_relax
0.87% [kernel] [k] futex_q_lock
0.84% libcrypto.so.3 [.] fe51_mul
0.82% [kernel] [k] el0_svc_common.constprop.0
0.74% libcrypto.so.3 [.] fe_mul
0.65% libcrypto.so.3 [.] OPENSSL_LH_flush
0.64% libcrypto.so.3 [.] OPENSSL_LH_doall
0.62% [kernel] [k] futex_wake
0.58% libc.so.6 [.] _int_malloc
0.57% [kernel] [k] wake_q_add_safe
0.53% libcrypto.so.3 [.] sha512_block_armv8

What is visible here is that all the CPU is wasted in locks and atomic operations and wake-up/sleep cycles, explaining why the CPU cannot go higher than 350-400%. The machine seems to be waiting for something while the locks are sleeping, causing all the work to be extremely serialized.

Another concerning curve is AWS-LC, the blue one near the bottom. It shows significantly higher performance than other libraries for a few threads, and then suddenly collapses when the number of cores increases. The profile reveals that this is definitely a locking issue, and it is confirmed by perf top:

AWS-LC 1.29.0:

86.01% libc.so.6 [.] <a href="mailto:pthread_rwlock_wrlock@@GLIBC_2.34">pthread_rwlock_wrlock@@GLIBC_2.34</a>
2.43% libc.so.6 [.] __aarch64_cas4_relax
1.78% libc.so.6 [.] __aarch64_cas4_acq
1.13% [kernel] [k] futex_q_lock
1.09% libc.so.6 [.] __aarch64_ldset4_acq
0.82% libc.so.6 [.] __aarch64_swp4_relax
0.76% [kernel] [k] queued_spin_lock_slowpath
0.65% haproxy-aws-lc-v1.29.0-std [.] curve25519_x25519_byte_scalarloop
0.25% [kernel] [k] futex_get_value_locked
0.23% haproxy-aws-lc-v1.29.0-std [.] curve25519_x25519base_byte_scalarloop
0.15% libc.so.6 [.] __aarch64_cas4_rel
0.13% libc.so.6 [.] _int_malloc

The locks take most of the CPU, atomic ops quite a bit (particularly a CAS – compare-and-swap – operation that resists contention poorly, since the operation might have to be attempted many times before succeeding), and even some in-kernel locks (futex, etc.). Approximately a year ago, during our initial x86 testing with library version 1.19, we observed this behavior, but did not conduct a thorough investigation at the time.

Digging into the flame graph reveals that it’s essentially the reference counting operations that cost a lot of locking:

With two libraries significantly affected by the cost of locking, we ran a new series of tests using HAProxy’s locks. (HAProxy was then rebuilt with USE_PTHREAD_EMULATION=1.)

The results were much better. OpenSSL 1.1.1 is now pretty much linear, reaching 124000 end-to-end connections per second, with a much cleaner performance profile, and less than 3% of CPU cycles spent in locks.

OpenSSL 1.1.1w:

7.52% libcrypto.so.1.1 [.] fe51_mul
6.47% libcrypto.so.1.1 [.] fe_mul
4.68% libcrypto.so.1.1 [.] sha512_block_armv8
3.64% libcrypto.so.1.1 [.] fe51_sq
3.42% libcrypto.so.1.1 [.] x25519_scalar_mult
2.67% haproxy-openssl-1.1.1w-emu [.] pthread_rwlock_wrlock
2.48% libcrypto.so.1.1 [.] fe_sq
2.33% libc.so.6 [.] _int_malloc
2.04% libc.so.6 [.] _int_free
1.84% [kernel] [k] __wake_up_common_lock
1.83% libc.so.6 [.] <a href="mailto:cfree@GLIBC_2.17">cfree@GLIBC_2.17</a>
1.80% libc.so.6 [.] malloc
1.59% libcrypto.so.1.1 [.] OPENSSL_cleanse
1.10% [kernel] [k] el0_svc_common.constprop.0
0.95% libcrypto.so.1.1 [.] cmov
0.91% libcrypto.so.1.1 [.] SHA512_Final
0.77% libc.so.6 [.] __memcpy_generic
0.77% libc.so.6 [.] __aarch64_swp4_rel
0.73% libc.so.6 [.] malloc_consolidate
0.71% [kernel] [k] kmem_cache_free

OpenSSL 3.0.2 keeps the same structural defects but doesn’t collapse until 32 threads (compared to 12 previously), revealing more clearly how it uses its locks and atomic ops (96% locks).

OpenSSL 3.0.2:

77.58% haproxy-openssl-3.0.2-emu [.] pthread_rwlock_rdlock
18.02% haproxy-openssl-3.0.2-emu [.] pthread_rwlock_wrlock
0.51% libcrypto.so.3 [.] ossl_sa_doall_arg
0.39% haproxy-openssl-3.0.2-emu [.] pthread_rwlock_unlock
0.34% libcrypto.so.3 [.] OPENSSL_LH_doall_arg
0.27% libcrypto.so.3 [.] OPENSSL_LH_flush
0.26% libcrypto.so.3 [.] OPENSSL_LH_doall
0.23% libcrypto.so.3 [.] __aarch64_ldadd8_relax
0.13% libcrypto.so.3 [.] __aarch64_ldadd4_relax

OpenSSL 3.0.14 maintains its (admittedly low) level until 64 threads, but this time with a performance of around 8000 connections per second, or slightly more than twice the performance with Pthread locks, also exhibiting an excessive use of locks (89% CPU usage).

OpenSSL 3.0.14:

60.18% haproxy-openssl-3.0.14-emu [.] pthread_rwlock_rdlock
28.69% haproxy-openssl-3.0.14-emu [.] pthread_rwlock_unlock
0.55% libcrypto.so.3 [.] fe51_mul
0.49% libcrypto.so.3 [.] fe_mul
0.46% libcrypto.so.3 [.] __aarch64_ldadd4_relax
0.33% libcrypto.so.3 [.] sha512_block_armv8
0.27% libcrypto.so.3 [.] fe51_sq
0.26% libcrypto.so.3 [.] x25519_scalar_mult
0.26% libc.so.6 [.] _int_malloc
0.22% libc.so.6 [.] _int_free

The latest OpenSSL versions replaced many locks with atomics, but these have become excessive, as can be seen below with __aarch64_ldadd4_relax() – which is an instruction typically used with reference counting and manual locking, and that still keeps using a lot of CPU.

OpenSSL 3.4.0-dev:

37.24% libcrypto.so.3 [.] __aarch64_ldadd4_relax
8.91% libcrypto.so.3 [.] evp_md_init_internal
8.68% libcrypto.so.3 [.] EVP_MD_CTX_copy_ex
7.18% libcrypto.so.3 [.] EVP_DigestUpdate
2.03% libcrypto.so.3 [.] fe51_mul
1.92% libcrypto.so.3 [.] EVP_DigestFinal_ex
1.78% libcrypto.so.3 [.] fe_mul
1.45% haproxy-openssl-3.4.0-dev-emu [.] pthread_rwlock_rdlock
1.43% haproxy-openssl-3.4.0-dev-emu [.] pthread_rwlock_unlock
1.22% libcrypto.so.3 [.] sha512_block_armv8
1.09% libcrypto.so.3 [.] fe51_sq
0.86% libc.so.6 [.] _int_malloc
0.85% libcrypto.so.3 [.] x25519_scalar_mult
0.77% libc.so.6 [.] _int_free

The WolfSSL curve doesn’t change at all; it clearly doesn’t need locks.

The AWS-LC curve goes much higher before collapsing (32 threads – 81000 connections per second), but still under heavy locking.

AWS-LC 1.29.0:

69.57% haproxy-aws-lc-v1.29.0-emu [.] pthread_rwlock_wrlock
4.80% haproxy-aws-lc-v1.29.0-emu [.] curve25519_x25519_byte_scalarloop
1.65% haproxy-aws-lc-v1.29.0-emu [.] curve25519_x25519base_byte_scalarloop
0.93% haproxy-aws-lc-v1.29.0-emu [.] pthread_rwlock_unlock
0.73% [kernel] [k] __wake_up_common_lock
0.52% libc.so.6 [.] _int_malloc
0.47% libc.so.6 [.] _int_free
0.45% haproxy-aws-lc-v1.29.0-emu [.] sha256_block_armv8
0.41% haproxy-aws-lc-v1.29.0-emu [.] SHA256_Final

A new flamegraph of AWS-LC was produced, showing much narrower spikes (which is unsurprising since the performance was roughly doubled).

Reference counting should normally not employ locks, so we reviewed the AWS-LC code to see if something could be improved. We discovered that there are, in fact, two implementations of the reference counting functions: a generic one relying on Pthread rwlocks, and a more modern one involving atomic operations supported since gcc-4.7, that’s only selected for compilers configured to adopt the C11 standard. This has been the default since gcc-5. Given that our tests were made with gcc-11.4, we should be covered. A deeper analysis revealed that the CMakeFile used to configure the project forces the standard to the older C99 unless a variable, CMAKE_C_STANDARD, is set.

Rebuilding the library with CMAKE_C_STANDARD=11 radically changed the performance and resulted in the topmost curves attributed to the -c11 variants of the library. This time, there is no difference between the regular build and the emulated locks, since the library no longer uses locks on the fast path. Now, just as with WolfSSL, performance scales linearly with the number of cores and threads. Now it is pretty visible that the library is more performant, reaching 183000 end-to-end connections per second at 64 threads – or about 20% higher than WolfSSL and 50% higher than OpenSSL 1.1.1w. The profile shows no more locks.

AWS-LC 1.29.0:

16.61% haproxy-aws-lc-v1.29.0-c11-emu [.] curve25519_x25519_byte_scalarloop
5.69% haproxy-aws-lc-v1.29.0-c11-emu [.] curve25519_x25519base_byte_scalarl
2.65% [kernel] [k] __wake_up_common_lock
1.60% libc.so.6 [.] _int_malloc
1.55% haproxy-aws-lc-v1.29.0-c11-emu [.] sha256_block_armv8
1.53% [kernel] [k] el0_svc_common.constprop.0
1.52% libc.so.6 [.] _int_free
1.36% haproxy-aws-lc-v1.29.0-c11-emu [.] SHA256_Final
1.27% libc.so.6 [.] malloc
1.22% haproxy-aws-lc-v1.29.0-c11-emu [.] OPENSSL_free
0.93% [kernel] [k] __fget_light
0.90% libc.so.6 [.] __memcpy_generic
0.89% haproxy-aws-lc-v1.29.0-c11-emu [.] CBB_flush

This issue was reported to the AWS-LC project, which welcomed the report and fixed this oversight (mostly a problem of cat-and-mouse in the cmake-based build system).

Finally, modern versions of OpenSSL (3.1, 3.2, 3.3 and 3.4-dev) do not benefit much from the lighter locks. Their performance remains identical across all four versions, increasing from 25000 to 28000 connections per second with the lighter locks, reaching a plateau between 24 and 32 threads. That’s equivalent to 22.5% of OpenSSL 1.1.1, and 15.3% of AWS-LC’s performance. This definitely indicates that the contention is no longer concentrated to locks only and is now spread all over the code due to abuse of atomic operations. The problem stems from a fundamental software architecture issue rather than simple optimization concerns. A permanent solution will require rolling back to a lighter architecture that prioritizes efficient resource utilization and aligns with real-world application requirements.

# Performance summary per locking mechanism

The graph below shows how each library performs, in number of server handshakes per second (the numbers are expressed in thousands of connections per second).

With the exception of OpenSSL 3.0.x, the libraries are not affected by the locks during this phase, indicating that they are not making heavy use of them. The performance is roughly the same across all libraries, with the CPU-aware ones (AWS-LC and WolfSSL) at the top, followed by OpenSSL 1.1.1, then all versions of OpenSSL 3.x.

The following graph shows how the libraries perform for TLS resumption (the numbers are expressed in thousands of forwarded connections per second).

This test involves end-to-end connections, where the client establishes a connection to HAProxy, which then establishes a connection to the server. Preliminary handshakes had already been performed, and connections were resumed from a ticket, which explains why the numbers are much higher than in the previous test. OpenSSL 1.1.1w shows bad performance by default, due to a moderate use of locking; however, it became one of the best performers when lighter locks were used. OpenSSL 3.0.x versions exhibit extremely poor performance that can be improved only slightly by replacing the locks; at best,  performance is doubled. 

All OpenSSL 3.x versions remain poor performers, with locking being a small part of their problem. However, those who are stuck with this version can still benefit from our lighter locks by setting an HAProxy build option. The performance of the default build of aws-lc1.32 is also very low because it incorrectly detects the compiler and uses locks instead of atomic operations for reference counting. However, once properly configured, it becomes the best performer. WolfSSL is very good out of the box. Note that despite the wrong compilation option, AWS-LC is still significantly better than any OpenSSL 3.x version, even with OpenSSL 3.x using our lighter locks.

# Future of SSL libraries

Unfortunately the future is not bright for OpenSSL users. After one of the most massive performance regressions in history, measurements show absolutely no more progress to overcome this issue over the last two years, suggesting that the ability for the team to fix this important problem has reached a plateau. 

It is often said that fixing a problem requires smarter minds than those who created that problem. When the problem was architected by a team with strong convictions about the solution‘s correctness, it seems extremely unlikely that the resolution will come from the team that created that problem in the first place. The lack of progress in the latest releases tends to confirm these unfortunate hypotheses. The only path forward seems to be for the team to revert some of the major changes that plague the 3.x versions, but discussions suggest that this is out of the equation for them.

It is hard to guess what good or bad can emerge from a project in which technical matters are still decided by committees and votes, despite this anti-pattern being well known for causing more bad than good; bureaucracy and managers deciding against common sense usually doesn’t result in trustable solutions, since the majority is not necessarily right in technical matters. It also doesn’t appear that further changes are expected soon, as the project just reorganized, but kept its committees and vote-based decision process.

In early 2023 Rich Salz, one of the project’s leaders, indicated that the QuicTLS project was considering moving to the Apache Foundation via the Apache Incubator and potentially becoming Apache TLS. This has not happened. One possible explanation might be related to the difficulty in finding sufficient maintainers willing to engage long-term in such an arduous task. There’s probably also the realization that OpenSSL completely ruined their performance with versions 3 and above; that doesn’t make it very appealing for developers to engage with a new project that starts out crippled by a major performance flaw, and with the demonstrated inability of the team to improve or resolve the problems after two years. At IETF 120, the QuicTLS project leaders indicated that their goal is to diverge from OpenSSL, work in a similar fashion to BoringSSL, and collaborate with others. 

AWS-LC looks like a very active project with a strong community. During our first encounter, there were a few rough edges that were quickly addressed. Even the recently reported performance issue was quickly fixed and released with the next version. Several versions were issued during the write-up of this article. This is definitely a library that anyone interested in the topic should monitor.

# Recommendations for HAProxy users

What are the solutions for end users?

  • Regardless of the performance impact, if operating system vendors would ship the QuicTLS patch set applied on top of OpenSSL releases, that would help a lot with the adoption of QUIC in environments that are not sensitive to performance.

  • For users who want to test or use QUIC and don’t care about performance (i.e. the majority), HAProxy offers the limited-quic option that supports QUIC without 0-RTT on top of OpenSSL. For other users, including users of other products, building QuicTLS is easy and will provide a 100% OpenSSL compatible library that integrates seamlessly with any code.

  • Regarding the performance impact, those able to upgrade their versions regularly should adopt AWS-LC. The library integrates well with existing code, since it shares ancestry with BoringSSL, which itself is a fork of OpenSSL The team is helpful, responsive, and we have not yet found a meaningful feature of HAProxy’s SSL stack that is not compatible. While there is no official LTS branch, FIPS branches are maintained for 5 years, which can be a suitable alternative. For users on the cutting edge, it is recommended to periodically upgrade and rebuild their AWS-LC library. 

  • Those who want to fine-tune the library for their systems should probably turn to WolfSSL. Its support is pretty good; however, given that it doesn’t have common ancestry with OpenSSL and only emulates its API, from time to time we discover minor differences. As a result, deploying it in a product requires a lot of testing and feature validation. There is a company behind the project, so it should be possible to negotiate a support period that suits both parties.

  • In the meantime, since we have not decided on a durable solution for our customers, we’re offering packages built against OpenSSL 1.1.1 with extended support and the QuicTLS patchset. This solution offers the best combination of support, features, and performance while we continue evaluating the SSL landscape.

The current state of OpenSSL 3.0 in Linux distributions forces users to seek alternative solutions that are usually not packaged. This means users no longer receive automatic security updates from their OS vendors, leaving them solely responsible for addressing any security vulnerabilities that emerge. As such, the situation has significantly undermined the overall security posture of TLS implementations in real-world environments. That’s not counting the challenges with 3.0 itself, which constitutes an easy DoS target, as seen above. We continue to watch news on this topic and to publish our updated findings and suggestions in the HAProxy wiki, which everyone is obviously encouraged to periodically check.

# Hopes

We can only hope that the situation will clarify itself over time.

First, OpenSSL ought not to have tagged 3.0 as LTS, since it simply does not work for anything beyond command-line tools such as “openssl s_client” and Curl. We urge them to tag a newer release as LTS because, while the performance starting with 3.1 is still very far away from what users were having before the upgrade, we’re back into an area where it is usable for small sites. On top of this, the QuicTLS fork would then benefit from a usable LTS version with QUIC support, again for sites without high performance requirements. 

OpenSSL has finally implemented its own QUIC API in 3.5-beta, ending a long-standing issue. However, this new API is not compatible with the standard one that other libraries and QUIC implementations have been using for years. It will require significant work to integrate existing implementations with this new QUIC API, and it is unlikely that many new implementations using the new QUIC API will emerge in the near future; as such, the relevance of this API is currently uncertain. Curl author Daniel Stenberg has a review of the announcement on his blog. 

Second, in a world where everyone is striving to reduce their energy footprint, sticking to a library that operates at only a quarter of its predecessor's efficiency, and six to nine times slower than the competition, contradicts global sustainability efforts. This is not acceptable, and requires that the community unite in an effort to address the problem. 

Both AWS-LC and QuicTLS seem to pursue comparable goals of providing QUIC, high performance, and good forward compatibility to their users. Maybe it would make sense for such projects to join efforts to try to provide users with a few LTS versions of AWS-LC that deliver excellent performance. It is clear that operating system vendors are currently lacking a long enough support commitment to start shipping such a library and that, once accepted, most SSL-enabled software would quickly adopt this, given the huge benefits that can be expected from these.

We hope that an acceptable solution will be found before OpenSSL 1.1.1 reaches the end of paid extended support. A similar situation happened around 22 years ago on Linux distros. There was a divergence between threading mechanisms and libraries; after a few distros started to ship the new NPTL kernel and library patches, it was progressively adopted by all distros, and eventually became the standard threading library. The industry likely needs a few distributions to lead the way and embrace an updated TLS library; this will encourage others to follow suit.

We consistently monitor announcements and engage in discussions with implementers to enhance the experience for our users and customers. The hope is that within a reasonable time frame, an efficient and well-maintained library, provided by default with operating systems and supporting all features including QUIC, will be available. Work continues in this direction with increased confidence that such a situation will eventually emerge, and steps toward improvement are noticeable across the board, such as OpenSSL's recent announcement of a maintenance cycle for a new LTS version every two years, with five years of support.

We invite you to stay tuned for the next update at our very own HAProxyConf in June, 2025, where we will usher in HAProxy’s next generation of TLS performance and compatibility.

Subscribe to our blog. Get the latest release updates, tutorials, and deep-dives from HAProxy experts.

Read the whole story
emrox
22 hours ago
reply
Hamburg, Germany
Share this story
Delete

Are 'CSS Carousels' accessible?

1 Share

This website is currently in the process of being refactored and redesigned. So if you see anything broken, that's why.

“CSS Carousels” were formally introduced a few weeks ago in an article on the Chrome for developers blog, and quite a few people have shared the excitement since then.

When I first heard of them I was very reluctant to jump on the bandwagon of excitement. I will also admit that there was even a small part inside of me that was terrified by the idea. Not only does creating interactive widgets using CSS violate the principle of Separation of Concerns (of which I am an advocate), but also because pretty much every implementation of a CSS-only widget I have seen before has had at least moderate to major accessibility issues.

But because the introductory article mentioned that carousel best practices are handled by the browser, and that it’d be very difficult to make a more accessible carousel than this, I was curious to learn more about them so that I could form a more objective and informed opinion on them. (After all, I’m also a developer and convenience sounds appealing to me too.)

So I did. I read the CSS specification and inspected the examples.

In this post, I want to share my findings from examining the accessibility and usability of “CSS Carousels”.

All the examples of CSS Carousels I’ve seen on the wild are based on the same reference—namely the CSS Carousels gallery. So we will be examining a few examples from the gallery to better understand how the new features work and how they affect the accessibility of HTML.

As I mentioned earlier, this stuff is still highly experimental. At the time of making of this post, it is only supported in Chrome Canary behind a flag. I will include video recordings of the examples that I am going to examine, so you don’t need to have Chrome Canary installed unless you want to try the exampels out for yourself.

Let’s start by first defining what CSS Carousels are.

“CSS Carousels” is an umbrella name for a collection of JavaScript-free, CSS-only implementations of common scrolling UI patterns—mainly patterns like sliders and carousels—that are implemented using new features defined in the CSS Overflow Module 5 specification.

You can find examples of these implementations in the “CSS Carousel gallery” that we will be examining in this post.

The CSS Overflow specification Module (Level 4) specifies CSS features for handling scrollable overflow. When an element has “too much” content for its size, the content “overflows”, and CSS provides features that allow you to handle this overflow—by making the element scroll in either or both directions, for example, or by clipping the overflow, truncating it, and so on and so forth.

The Level 5 of this specification (which is currently still a Working Draft) defines a set of pseudo-elements that are designed to provide specific visual and interactive affordances for scroll containers.

More specifically, according to the specification, this module:

“defines the ability to associate scroll markers with elements in a scroller (or generate them automatically as ::scroll-marker pseudo-elements, with automatic user behavior and accessible labels), which can be activated to scroll to the associated elements and reflect the scroller’s relative scroll progress via the :target-current` pseudo-class.”

Let’s break this down a little bit.

The specification “defines the ability to associate scroll markers with elements in a scroller”…

A scroll marker is any element or pseudo-element with a scroll target.

The HTML <a> element and SVG <a> element are scroll markers… […] While these navigational links can be created today, there is little feedback to the user regarding the current content being viewed…

For example, think of a sticky Table of Content (TOC), where a link is highlighted when the link’s target section scrolls to the top of the viewport. You can see an example of such a TOC on the web.dev blog, and on MDN guides as well. The active link changes based on which section is scrolled into the viewport.

The links in these tables of content are scroll markers. They mark the scroll position of their target sections. When a link’s target section scrolls to the top of the page, the link is styled to reflect the current scroll position of the section, and to indicate that the section is currently “active”.

We’ve always resorted to using JavaScript to style these links when their respective sections are scrolled into view.

If you inspect the active links on the web.dev blog in the Chrome DevTools, you can see that a specific class name is added to a link when it becomes “active”. This class name is used to apply active styles to the link in CSS.

So, the premise of not requiring JavaScript to style these links and instead take advantage of CSS’s new pseudo-selectors sounds really great! (The :target-current selector in particular is supposed to enable this. More on this later.)

Next, the specification specifies that it adds a mechanism for creating groups of scroll markers, and for automatically creating ::scroll-marker pseudo-elements, and that within each group, the active marker reflects the current scroll position, and can be styled to give the user an indication of which section they are in.

In other words, this specification defines a mechanism that allows you to (1) create a group of scroll markers for a scroll container, where each of the individual scroll markers in the group corresponds to an item in the scroll container, and (2) these markers can be styled to indicate the scroll position within the container.

But scroll markers, by nature, are interactive elements. So the purpose of this specification is to enable CSS to create interactive pseudo-elements (not real elements because CSS can’t do that, nor is it intended to).

This is where things start to become concerning.

Before we discuss why, let’s first get a quick overview of how the scroll markers are created.

A (very quick) overview of how to create scroll markers in CSS

This post is not a tutorial. Since the announcement of CSS Carousels, a few tutorials have been written about the topic, including an MDN guide for Creating CSS Carousels.

But for the purposes of completeness of this post, here’s a high-level, bird’s eye view of how it works:

Say you have a scroller element containing a series of items. For example, say you have a list of images in a horizontally-scrolling container. And say you want to create a list of “dots” for this container that provide a visual indicator of how many images there are in the list and that indicate which item in the list is currently “active”. These dots are also interactive and can be used to scroll their target images into view.

And say that, for some reason, you don’t want to create these indicators in HTML but rather want to create them using CSS instead. (I won’t judge. Yet.) This is the kind of thing that this level of the Overflow specification aims to enable.

You can use the new scroll-marker-group property defined in the specification to instruct the browser to create a grouping container for these dots:

ul.scroller {
scroll-marker-group: after;
}

The property accepts three values: none, before, and after.

The before and after values indicate whether you want to show the scroll markers before or after the items in the scroll container. If you want the dots to appear before the list of items in the scroller, you use the before value.

The scroll marker group is created inside the list in the form of a pseudo-element: ::scroll-marker-group.

The ::scroll-marker-group pseudo-element is a fully-styleable element (i.e. you can use any CSS property to style it), and it implicitly behaves as a single focusable component, establishing a focusgroup. This means that the group takes up only one tab stop on the page. To navigate through the scroll markers inside the group, you can use the Arrow keys.

The ::scroll-marker-group pseudo-element is a container for its contained ::scroll-marker pseudo-elements (these would be the “dots” corresponding to the images in the list).

To create a scroll marker for the items in the list, you can use the ::scroll-marker pseudo-element on the list items. Like other pseudo-elements, this element will not be rendered if the content property is not declared:

ul.scroller {
scroll-marker-group: after;

> li::scroll-marker {
content: ... / ...;
}
}

Even though the scroll markers are prepended to the list items in the markup, they are (according to the specification) “collected into the ::scroll-marker-group” so that they can be exposed as a group to assistive technologies.

Now, the ::scroll-markers that the browser creates are interactive elements. Activating a scroll marker will scroll its corresponding item into view.

And the :target-current selector can be used to style the currently-active :scroll-marker when its corresponding target is shown. For example:

ul.scroller {
scroll-marker-group: after;

> li::scroll-marker {
content: ... / ...;
}

> li::scroll-marker:target-current {
/* style active marker */
}
}

Unlike native scroll markers, though, these are not links. These are interactive pseudo-elements. More on this later.

The specification also defines the ::scroll-button() pseudo-element, which you can use on the scroll container to add (you guessed it!) scroll buttons.

ul.scroller {
...

&::scroll-button(left)
{
content: ...;
}

&::scroll-button(right) {
content: ...;
}
}

These pseudo-buttons are also fully styleable elements: there is no restriction on what properties you can apply to them. And you can even use the :disabled pseudo-class to apply disabled state styles to the ::scroll-button()s when they are disabled.

ul.scroller {
...

&::scroll-button(left)
{
content: ...;
}

&::scroll-button(right) {
content: ...;
}

/* focus styles */
&::scroll-button(*):focus-visible {
/* focus styles here */
}

/* disabled state styles */
&::scroll-button(*):disabled {
/* disabled state styles here */
}
}

As we mentioned earlier, unlike the ::before and ::after pseudo-elements which are static text elements, the ::scroll-markers and ::scroll-button()s are interactive elements.

Interactive elements have specific accessibility requirements. They should have meaningful roles and descriptive names that identify their purpose, so that the user knows what to expect when they interact with them.

So, at this point there are quite a few questions we should be asking:

  • Do these scroll markers meet the accessibility requirements for interactive elements?
  • How are the scroll markers exposed to assistive technology users like screen reader users?
  • What roles is the browser exposing for these interactive pseudo-elements?
  • Do they provide meaningful semantics to the user to help them understand what they are interacting with?
  • Does the browser give them accessible names? The specification states that it “defines the ability to associate scroll markers… or generate them automatically as ::scroll-marker pseudo-elements, with automatic user behavior and accessible labels” (emphasis mine). So how are these markers labelled?

We can only find the answer to these questions in the HTML markup, which the browser uses to create the accessibility tree.

Semantic HTML is the foundation of accessibility on the Web.

Semantic HTML carries meaning. Assistive technologies (AT) like screen readers (SR) rely on the meaningful semantics in HTML to present Web content to their users and to create an interface for navigating that content.

But HTML is only as accessible as you write it. Even semantic HTML can be “inaccessible” if you don’t write it as it is intended.

For example, many elements are only meaningful when they are children of other elements, or when they are associated with other elements. If you don’t use these elements as intended, then they will lose their meaning and they won’t be as useful to screen reader users anymore.

So there are certain “rules” that you should follow to ensure that you get the most out of HTML’s inherent accessibility.

HTML provides many meaningful, semantic elements that represent various types of content & interactive controls. And using those elements is critical for describing the purpose of your content to assistive technology users.

But there are still some more complex components that don’t yet have meaningful elements in HTML to represent them. Until these elements exist, we can use ARIA to create them.

Think of ARIA as a polyfill for HTML semantics. It provides additional attributes—roles, states, and properties—that allow us to create complex, interactive components that do not yet have native equivalents in HTML. Using ARIA attributes we can describe these UI components to assistive technologies like screen readers.

So, together, HTML and ARIA provide important accessibility information to screen readers, without which the content of the page would not be perceivable, operable, or understandable by their users.

This is why it is always critical to understand how a CSS feature may affect the accessibility information created in HTML.

The CSS Overflow Module aims to define a set of features that provide visual affordances to scrollable containers by creating and appending new interactive elements (the scroll markers) into the HTML markup of the page. As such, we must expect these elements to affect the accessibility information exposed to assistive technologies like screen readers.

The ::before and ::after pseudo-elements already do affect the accessibility information of an element because the contents of these elements are exposed to screen readers, and they contribute to the accessible name computation of the element they are created on.

Scroll markers will affect the accessibility information differently because they are also interactive, which means that they are expected to expose roles, states, and other properties, depending on the type of element that they are exposed as.

Now, to check the accessibility information of the page, we can inspect the page’s accessibility tree.

The accessibility tree (“accTree”) is a tree of objects (similar to the DOM tree), each object containing accessibility information about an element on the page. Not all elements are represented in the accTree because not all elements are relevant for accessibility. Only a meaningful element that is not hidden (using HTML or CSS) is exposed in the accessibility tree.

The information the browser exposes about an element depends on the nature of the element (like whether it’s interactive or not).

Typically, there are four main pieces of information that the browser exposes about an element in the accTree:

  1. The element’s role (What kind of thing is it?). The role of an element identifies its purpose. It lets a screen reader user know what something is, which is also an indication of how that thing is to be used.
  2. The element’s name (a.k.a the accessible name, “accName”). The names identifies an element within an interface and in some cases helps indicate what an element does.
  3. The element’s description if it has one. For example, a text input field may have a short description of what the expected input looks like.
  4. The element’s state when it has one. For example, is the button pressed? is the checkbox checked, unchecked, or undetermined?

The accessibility tree also exposes any properties that the element may have (such as if a button is focusable or disabled), as well as any relationships with other elements (like if the element is part of a group, or if is it labelled by or described by another element).

The information exposed in the accessibility tree is very useful to us as developers because it gives us insights into how our content will be exposed to and presented by screen readers.

Knowing how the browser exposes scroll markers to the user allows you to check whether the information being exposed is hepful to the user’s understanding of the page or not, and it allows you to test whether your component meets the expectations the user has based on that information.

Importantly, how scroll markers are exposed in the context they are used in will be critical to determining how they affect the accessibility and the usability of your components.

Quick refresher: Inspecting CSS scroll markers’ accessibility information in the browser DevTools

We can inspect how the browser is exposing an element in the accTree using the browser DevTools.

When you open the Chrome DevTools, you can find the accessibility information of an element under Accessibility panel. You will find the Accessibility panel on the right side of the CSS Styles panel.

In addition to inspecting the accessibility object for each element in the Accessibility Panel, Chrome also provide a full-page accessibility tree view. To use it, in the Accessibility tab, check the “Enable full-page accessibility tree” option.

(First-time only) click the “Reload devtools to enable feature” button at the top of the DevTools.

Then, in the Elements tab, click “Switch to accessibility tree view” button in the top right corner.

Now, the full-page accessibility tree replaces the DOM tree in the panel, and element names, roles, values and states are shown in an easy-to-read, and very practical hierarchal tree view.

This view gives you an overview of how the contents of the entire page are exposed. We’re going to use this tree view to understand the CSS Carousel examples better.

So what we’re going to do next is we’re going to go over a few of the examples in the CSS Carousel gallery and we’re going to inspect the accessibility information for these examples, use a screen reader to navigate them, operate them using a keyboard, and generally examine the usability of these examples. After all, the gallery’s homepage encourages us to inspect the CSS, review the DOM, and check the accessibility tree. So, let’s do just that.

Examining the accessibility of CSS Carousels

Before going through each example separately, I want to mention a few things that all examples have in common:

  1. The scroll-marker-group property is used on the scroll container to create a ::scroll-marker-group pseudo-element inside the container. If you disable the property, the group of scroll markers (::scroll-marker-group) is removed, and so is every scroll marker (::scroll-marker) corresponding to each of the items in the container.
  2. The ::scroll-marker-group element is exposed as a tablist in the accessibility tree. Note that this is not mentioned anywhere in the CSS specification (at the time of making of this post). These are the semantics that Chrome is currently exposing under the hood.
  3. Each ::scroll-marker element is exposed as a tab within the tablist.
  4. The accessible name for the scroll marker is provided in CSS via the content property. You must provide the name for each tab. The browser will not do this for you.
  5. When ::scroll-button()s are present, they are exposed as buttons. You are also expected to provide an accessible name to these buttons via the CSS content property.

Now, here is the most important takeaway from all of this:

Because all the scroll markers are exposed as tabs, this means that all the carousel examples in the gallery are supposed to be Tabs widgets.

Yes, you read that right. Even though most of these components don’t look or behave like Tabs, the browser is using Tabs widget semantics to describe them to assistive technologies. This is already concerning because the gallery contains different UI patterns that are clearly not Tabs.

If you’re a student of my Practical Accessibility course, then you’ll remember the statement: ARIA is a promise.

When you use ARIA to describe an element to your users, you must ask yourself: Am I delivering on the promise I have made to the user? Is this element really what I’m exposing it to be?

The browser is using ARIA Tab widget roles to expose the examples in the CSS Carousels gallery as Tabs widgets. The question is: Are they really? Do these examples meet the expectations and requirements for Tabs widgets?

We’ll start to get more technical from here on.

Tabs widget accessibility requirements

Tabs have specific accessibility requirements. We’ll start by reviewing these requirements for so we have a benchmark to test the examples against.

If these requirements are not met, then the examples are going to be confusing and unusable by screen reader users.

If you’ve taken my course, then you’ll remember from the very first ARIA chapter—ARIA 101—that ARIA is extremely powerful but also very dangerous if you don’t use it correctly. And that if you’re not aware of how roles, states and properties work together, you can end up creating a more confusing and inaccessible user experience.

We also learned that the ARIA specification documents the requirements for ARIA roles in the definition of each role, and that there are strict parent-child relationships between some ARIA roles. This means that the use of some attributes is restricted to specific contexts or parents. Some roles can only be used as a child to a specific—usually composite—ARIA role.

It is important that you learn and understand how ARIA attributes are used and nested, especially if you’re creating components that are re-used in various contexts across a website or application.

To create a Tabs widget today, you need to use the ARIA tab role, the tabpanel role, and the composite tablist role.

According to the specification (emphasis mine):

Authors MUST ensure elements with role tab are contained in, or owned by, an element with the role tablist.

[…]

Authors MUST ensure that if a tab is active, a corresponding tabpanel that represents the active tab is rendered.

[…]

For a single-selectable tablist, authors SHOULD hide other tabpanel elements from the user until the user selects the tab associated with that tabpanel.

[…]

In either case, authors SHOULD ensure that a selected tab has its aria-selected attribute set to true, that inactive tab elements have their aria-selected attribute set to false

The ARIA Specification, tab role definition

So the specification specifies how the tab and tablist roles should be used, and states the requirements needed to ensure the Tabs widget you’re creating is accessible.

The ARIA specification also refers to the ARIA Authoring Practices Guide (APG) for technical guidance about implementing Tabs.

The APG’s primary purpose is to demonstrate how to use ARIA to implement widgets in accordance with the ARIA specification.

According to the guidance in the APG’s Tabs pattern page (emphasis mine):

Tabs are a set of layered sections of content, known as tab panels, that display one panel of content at a time. Each tab panel has an associated tab element, that when activated, displays the panel.

[…]

When a tabbed interface is initialized, one tab panel is displayed and its associated tab is styled to indicate that it is active. When the user activates one of the other tab elements, the previously displayed tab panel is hidden, the tab panel associated with the activated tab becomes visible, and the tab is considered “active”.

There are also two types of Tabs:

  1. Tabs With Automatic Activation: A tabs widget where tabs are automatically activated and their panel is displayed when they receive focus.
  2. Tabs With Manual Activation: A tabs widget where users activate a tab and display its panel by pressing Space or Enter.

Regardless of the type of activation, a Tabs component has these keyboard interaction requirements:

  • When you press the tab key and focus moves into the tab list, focus moves to the active tab element.
  • When the tab list contains the focus, pressing the tab key again moves focus to the next element in the page tab sequence outside the tablist, which is the tabpanel unless the first element containing meaningful content inside the tabpanel is focusable.
  • When focus is inside the tab list:
    • Pressing the Left Arrow key moves focus to the previous tab. If focus is on the first tab, it moves focus to the last tab.
    • Pressing the Right Arrow key moves focus to the next tab. If focus is on the last tab element, moves focus to the first tab.

We’re going to focus on the automatic activation tabs widget because the CSS Carousels examples are implemented so that a scroll marker’s corresponding item is shown when the marker receives focus, which means that the widget is supposed to be an automatic activation widget.

Here is a video demonstration of how the automatic activation tabs widget is expected to be operated using a keyboard, and then using a screen reader.

So, in an automatic activation tab widget, moving keyboard focus (not screen reader focus) from one tab to the other activates the tab’s corresponding tabpanel. The other tabpanels are hidden from all users and are therefore also not accessible by keyboard.

Now let’s go over a few of the examples in the CSS Carousel Gallery and check to see if they meet the requirements defined in the ARIA specification, and the expected semantics and behavior listed in the APG.

For each example, I’m going to focus on specific aspects of accessibility more than others. For example, I will highlight keyboard navigation in one example, screen reader navigation in another, screen reader announcements in another, and so on and so forth, depending on what issue stands out for every particular example.

Remember that, with the current implementation of scroll markers, each of these examples is supposed to be a Tabs widget.

And, finally, remember that tabs and buttons, like every other interactive element, need an accessible name.

With all of this said, let’s start going through the examples in the gallery.

I’m going to start with the horizontal list—a typical carousel example.

The Horizontal List example

In this example, like all the other examples in the gallery, if you open the DevTools and inspect the accessibility tree you can see that the scroll markers are exposed as tabs contained in a tablist.

However, there are no corresponding tabpanels for these tabs.

As we mentioned earlier, the ARIA specification states that you  "MUST ensure that if a tab is active, a corresponding tabpanel that represents the active tab is rendered." However, there are no tabpanels at all in this example. So this Tabs widget is already missing an integral part of what makes it a Tabs widget. What do the tabs control?

Looking at the individual tabs, you’ll find that all the tabs in the tablist share the same accessible name.

Looking in the Styles panel, you will find that the accessible name for all of the ::scroll-markers is provided using the CSS content property:

.scroll-markers {
..
&::scroll-marker
{
content: "" / "Carousel item marker";
..
}
}

The notable thing here is that the name is provided as a fallback alt text. This is because the dots are not supposed to have visible text labels, so the content is left empty.

Now, because this declaration is provided as alt text for all scroll markers on all the list items, the alt text is exposed as the accessible name for all the scroll markers corresponding to all the list items.

Buttons, tabs, and other interactive elements that do different things should have unique names that describe their purpose. But what we have here is 16 tabs that share the same name. So how does a user know which item each “tab” corresponds to?

Instead of providing one accessible name for all scroll markers, each marker should be given its own unique name. This means that you will want to select each list item and provide a unique accessible name for its marker. You can do that by providing a unique label in the content property, or, alternatively, you could provide the unique names in the form of HTML attributes in the markup and then reference the names in CSS using one declaration that uses the attr() function. We’ll see this declaration in action in another example.

Now, looking at the carousel itself, the first thing that comes to mind is that unlike a Tabs widget, more than one item is shown in this carousel at a time.

The APG description of a Tabs widget says that “Tabs are a set of layered sections of content, known as tab panels, that display one panel of content at a time.

The ARIA specification also states that “for a single-selectable tablist, authors should hide other tabpanel elements from the user until the user selects the tab associated with that tabpanel.”

Now, tabs can be multi-selectable. The ARIA specification specifies a multi-selectable tabs widget as a kind of tabs where more than one tab can be selected at a time.

However, when more than one tab can be selected at a time, the specification states that you should ensure that the tab for each visible tabpanel has the aria-expanded attribute set to true, and that the tabs associated with the remaining hidden from all users tabpanel elements have their aria-expanded attributes set to false.

If you inspect the accessibility tree, you can see that the scroll marker group is not a multi-selectable tablist. The browser sets the multiselectable attribute value to false on the list indicating that only one tab can be selected at a time. And if you inspect the tabs within the tab list, you can see that only one tab has aria-selected=true on it at a time.

If this were meant to be a multi-selectable tabs widget, then it should indicate that, and the browser should set the aria-selected and aria-expanded attributes to true on all the scroll markers of the visible items.

However, even the CSS specification is specific about selecting only one scroll marker at a time. It literally states that exactly one scroll marker within each scroll marker group is determined to be active at a time.

So, by definition, scroll markers are not designed to be used to create multi-selectable components, and especially not multi-selectable Tabs components.

So, what we have here is scroll markers being exposed as single-selectable tabs in a carousel component where more than one item is “active” at a time. And even though more than one one item is active, only one scroll marker is “selected”.

And this is not something that you, as the author of the code, can change because you have no control over the semantic output of the CSS properties you use.

This indicates that the semantics exposed for the scroll markers are neither suitable nor representative of the component they are used to implement.

Now, if you navigate to the carousel using a screen reader (I am using VoiceOver on macOS), you will notice that numeration inside the list is off. This is because the browser adds the scroll marker group as well as the Previous and Next scroll buttons as direct children to the list, as siblings to the list items. So the total number of items in the list is miscommunicated to the user and no longer represents the actual number of list items.

Here is a video recording of how the carousel is announced with VoiceOver on macOS:

Furthermore, notice how the Scroll Left button remains focusable even though it is meant to be disabled.

A disabled <button> will typically be removed from the sequential tab order of the page, and will be exposed as a disabled button to screen reader users. However, if you inspect the accessibility information of that button when it’s in the disabled state, you’ll find that the browser does not expose it as a disabled button.

So, this carousel example has several accessibility and usability issues.

A blind screen reader user navigating this carousel would encounter a broken component and would likely be confused as to what it is they are interacting with, and what will happen when each of the “tabs” is activated.

So, as a conclusion I would say that this implementation of a horizontal list carousel is not accessible, and not ready for production.

Moving on to the Cards example…

The Cards example

If you pull up the browser DevTools again to get an overview of the accessibility information exposed to screen readers in the accessibility tree, you can see that, like with the previous example, the browser has appends a single-selectable tablist to the scroll container.

Like with the horizontal list example, more than one Card is visible at a time, yet only one tab is selected at a time.

The tablist in this example contains five tabs corresponding to the five different cards. Yet all the tabs have the same accessible name.

The cards are implemented (and exposed) as articles. So, once again, there are no tabpanels in this widget either.

Since this example contains interactive elements inside the carousel items, let’s focus on how keyboard navigation works in the carousel. We will also be testing screen reader navigation separately.

When I start to navigate the page using the Tab key, focus moves inside the carousel to the scroll marker group (the tablist).

Pressing the tab key again moves focus outside the group into the next focusable element in the DOM, which is the Scroll Left button. Pressing the tab key again moves focus to the Scroll Right button. Normally, you would expect keyboard focus to move from the selected tab to the tab panel it controls (or a focusable element inside that panel). This is also the expected behavior stated and demonstrated in the APG.

Now, when I press the tab key again, this is where the carousel starts to behave erratically.

Here is a short video recording of me navigating the Cards example using keyboard, followed by some of the most important observations:

First, when you’re in the scroll markers group and you press the tab key, focus does not necessarily move to the currently selected card (the “tabpanel”). Instead, it moves to the first focusable element in the first card inside the scroller. Sometimes it will move to the first visible card. Sometimes it will move to the first card in the container even when it’s not visible. And sometimes it will move to the expected card.

Second, pressing Shift + Tab when you’re inside a card to navigate backwards does not return keyboard focus back to the tab that activated the card (which is the expected keyboard behavior for a Tabs widget). Instead, keyboard focus moves to the link in the previous card, and then to the link in the previous previous card, and then to the link in the previous previous previous card, and so on and so forth. This is because the invisible cards are not really hidden like they would be in a Tabs component. They are just scrolled out of view. As such, keyboard focus moves to an element that is not even supposed to be active or accessible.

Third, you will also notice that when focus moves to the link inside a card, the card’s corresponding tab is not selected. A tab is only visually marked as selected when it scrolls into a certain position within the container. Because of that, the browser will skip a tab and visually select the one that comes after it.

And lastly, if you inspect the accessibility information exposed to the user as you navigate the carousel using keyboard, you’ll also see that the state of the tabs is not correctly conveyed to the user. Even when a tab is visually marked as selected, its accessibility state is not updated. So a blind screen reader user navigating using a keyboard will not be getting the same feedback as a sighted user does.

And a sighted screen reader user will also get a mismatch between what they see on screen and what the screen reader announces to them.

Here is a video recording of navigating the Cards carousel using VoiceOver on macOS:

Both keyboard and screen reader navigation is broken in this example.

The entire behavior of this widget is based on how any normal scrolling container containing focusable elements would behave. This carousel does not behave like Tabs because it is not a Tabs widget. The accessibility information exposed to screen reader users is generally misleading and mostly incorrect, making it unusable.

Moving on to the Scroll Spy example…

The Scroll Spy example

The ScrollSpy example displays a series of content sections inside a vertically-scrolling container.

This scrolling container contains what effectively looks like an article made up of a series of sections with headings, and that has a table of contents on the left side of the article. Only instead of having a table of contents—which is semantically structured using a list of links, this example is also implemented using CSS scroll markers. This means that instead of a list of links, the “article” has a group of tabs!

If you inspect the accessibility information in the accTree, you’ll notice common issues with the previous examples:

  1. We have a single-selectable tablist with only one tab selected at a time, when more than one section is visible at a time.
  2. Like previous examples, there are no tabpanels in what is supposed to be a Tabs widget. Instead, the sections are implemented as regions. This means that each section is exposed as a page landmark, which is very uncommon for a series of text sections like these. We’ll talk about why they are exposed as regions shortly.

Once again, the browser is exposing semantics that are not representative of the pattern they are used to implement.

Furthermore, the tabs in this example do not have accessible names.

If you check the Styles panel, you’ll notice that the name of the markers is provided using the attr() function.

The :scroll-marker of each section pulls its content (and by extension: its accessible name) from the aria-label attribute on that section.

section::scroll-marker {
content: attr(aria-label);
}

Even though the content of the aria-label attribute is visually rendered in the scroll markers, it is not exposed as a name for the tabs in the accessibility tree. So these markers have no accessible names.

This is probably a bug.

Now, the presence of aria-label on the <section>s provide these sections with an accessbile name. And, according to the specification, a <section> is exposed as a region landmark when it is given an accessible name. This is why the sections in this component are exposed as landmark regions.

Here is how VoiceOver on macOS announces the ScrollSpy example:

Notice how VoiceOver announces the tabs with no names.

You will also notice in the recording that the tabs are announced as selected, even when their target sections are not scrolled into view.

So the screen reader announces the presence of tabs only, but there is no other information describing the component to the user. A blind screen reader user will come across a list of controls that have no names and no indication of what they control.

In addition to highlighting the scroll marker naming bug, I wanted to use this example as an opportunity to highlight another issue with the :target-current selector introduced in the specification.

Instead of using ::scroll-markers to implement this example, I would instead expect to be able to create a semantic table of contents using an HTML list of <a href="">, and then use the :target-current pseudo-class to apply active styles to a link (the native scroll marker!) when its target is scrolled into view.

However, that doesn’t seem to work at the moment. I created a reduced test case where I have a series of sections, and a list of links to those sections. I used the :target class to apply a yellow background color to the target section, and the new :target-current class to style the link associated with that section. But the styles are not applied to the link when its target section is “active”.

Unfortunately, even though the specification states that it defines the ability to associate scroll markers with elements in a scroller, the current implementation of the :target-current pseudo-class seems to work only for CSS-generated ::scroll-markers, but not for native HTML ones.

Personally, I think :target-current is one of the most useful additions to the specification. It’s unfortunate that its current implementation is limited to the new pseudo-elements.

Moving on to one last example: the horizontal tabs example—the perfect candidate for a Tabs widget implementation.

The Horizontal Tabs example

If you inspect the accessibility tree for this example, you can see the scroll marker group exposed as a tablist, and each of the three scroll markers exposed as a tab.

The tabs don’t have an accessible name in this example either. We’ll inspect the CSS declaration for the markers shortly.

Unlike the previous examples, this example does have three tabpanels exposed in the tree.

We know by now that the browser does not add tabpanel roles to the items in a scroller. So these roles must be provided in the markup.

And sure enough, if you inspect the HTML markup for this component, you can see that the tabpanel roles are hard-coded into the HTML.

The panels are also given accessible names using aria-label, which is once again used to provide the contents and names for each of the scroll markers in CSS.

.carousel--snap {
..
&::scroll-marker
{
content: attr(aria-label);
..
}
}

This is why the tabs don’t have an accessible name. As we mentioned in the previous example, this is probably a bug.

Let’s fire up VoiceOver and check how the Tabs are announced.

When using VoiceOver navigation to navigate to the tabs, all the tabs are announced as selected, and the selected state of the tabs is also updated in the accessbility tree. However, their visual styles are not updated to reflect that they are selected, and their corresponding tabpanels are not shown when they are selected.

Additionally, using VoiceOver navigation (Right and Left Arrow keys) you are able to navigate between the tabpanels without needing to activate their corresponding tabs. That said, when you navigate to a tabpanel, the tabpanel is initially announced as empty. Pressing the Right Arrow key again, the screen reader announces the content inside the tabpanel just fine.

It is possible to also navigate through the tabpanels using keyboard Arrow keys, without needing to use the tabs at all.

As the ARIA specification notes, you should hide other tabpanels from the user until the user selects the tab associated with that tabpanel. In this case, the tab panels were accessible and were shown even when their corresponding tabs were not activated. Again, this is because what we have here is technically still a scrolling container, not a Tabs widget.

In my opinion, allowing the tab panels to be accessed by scrolling defeats the purpose of using Tabs to begin with. What is the purpose of the tabs if the content is already accessible without them? Selecting one tab does not really hide the tabpanels associated with the other tabs, it only scrolls it out of view. So this Tabs example only partially behaves like a Tabs widget, and partially like a typical scrolling container.

Now, as we mentioned earlier, the tabpanel roles are hard-coded into HTML in this particular example. It is not the browser that is adding and exposing these roles.

I don’t know why this example has the tabpanel roles hard-coded in. However, what I do know is that you, as a developer, are also expected to add these roles to your markup.

This also means that you should be aware of the fact that these roles are missing in what is otherwise being exposed as a Tabs widget. Yet, as we mentioned earlier, these exposed semantics are not specified in the CSS specification.

This is why it is very important for you to be responsible for the code you write/use and always, always check how it is exposed to screen reader users, and to test it to ensure that it is usable.

And this is why it is critical that you understand the role of the accessibility tree and the information it carries, understand how ARIA roles work, as well as understand the requirements for the widgets you are creating to ensure they are operable and that they meet the user expectations.

If you didn’t know that scroll markers are exposed as tabs and that you needed to add tabpanel roles to the widget you’re creating, then you would end up with a widget that’s broken in many ways, like the examples we examined earlier.

That being said, even hard-coding the tabpanel roles into your markup has its downsides because the tablist and tab roles are added via CSS. So what happens when CSS is not available? What happens if the user is viewing your page in Reader Mode, for example?

What I think would be a little more foolproof is if the browser added these visual affordances and behavior only when all the required ARIA roles are present in the markup.

In other words, the features defined in the specification could be made useful for some common use cases, not all—because one size almost never fits all; and then you would make sure you have the important accessibility bits taken care of in your markup.

That being said, what I personally think we really need instead is a standardized HTML markup structure.

Wouldn’t it be nice if we could just write HTML and have the browser just know what ARIA roles to expose to AT, and have it provide all the necessary keyboard interactions for free?

We can already do that for native interactive elements like a <button> and a <a> and a <details> element, to name a few.

So what we really need is native HTML elements with built-in semantics and interactive behavior for creating UI patterns that currently have no equivalents in HTML. This includes Tabs, sliders, and carousels.

This brings me to the end of this examination. So, what’s the conclusion here?

Conclusion and closing thoughts

All the issues I have covered in this post are specific to screen reader and keyboard navigation. I haven’t discussed how the tabs and scroll buttons could be cumbersome to operate for speech control users, particularly when they don’t have text labels. We haven’t talked about how the tabs could become invisible in Forced Colors Modes if they are styled using CSS background colors alone. And we haven’t tested the names of the tabs to see if they actually translate into other languages. And what happens when CSS is not available?

As developers, we are responsible for the code we write. And we are responsible for testing the components we create.

That said, I think the specification should be more explicit about how the new features it defines affects and don’t affect the accessibility of the content they are used on. That would make it easier for developers to know where there are gaps that need to be filled, and issues that need to be resolved before using these features in production.

Knowing the capabilities and limitations of a new feature is critical to understanding when to use it and what it is appropriate for.

In its current state, the specification adds a layer of abstration on top of HTML semantics that, dare I say, is quite risky, especially because these features are introduced as accessible by default.

There’s a lot the browser doesn’t currently do and that you need to take care of yourself if you want to use these new features in your projects.

If you don’t know better, you could end up creating inaccessible and unusable user interface elements with these new features, all the while assuming that the browser is “taking care of accessibility” for you.

While abstractions are often convenient for us developers, this convenience must not be delivered to us at the cost of user experience and accessibility. As responsible developers, it is on us to push back when necessary and require new features to be inclusive of the users we are creating user interfaces for.

The browser is currently creating lists of scroll markers for our convenience and it exposes them as tabs, regardless of whether the pattern they are used to create is actually a Tabs widget or not. And it does that because—surprise, surprise!—CSS is not where semantics are defined. How does the browser know what an element is? It knows that from HTML.

Semantics should be defined in HTML. And styles and visual affordances should follow from there.

As I mentioned earlier, what I believe we need is native HTML elements with built-in semantics and interactive behavior for creating other UI patterns that currently have no equivalents in HTML. This includes Tabs, sliders, and carousels. And CSS could provide an additional layer of visual affordance on top of that. That would be great!

The OpenUI has already started research on a native Tabs component long ago, as well as a new carousel and slider component. And there are current discussions already happening about a native <menu> element (which replaces the current HTML <menu> element which is essentially just a list).

It would be great if more resources were allocated for doing proper research and user testing for the work being done by the OpenUI group, so that these well-researched and accessibility-reviewed features are implemented sooner than later.

Outro

So, there you have it. CSS Carousels are highly experimental, not currently accessible, and therefore, not ready for production.

But this is not the only insight I want you to take away from this post. After all, this post isn’t merely about highlighting the current issues with CSS Carousels.

Rather, it’s about awareness.

If there is one thing you take away from this post let it be to learn how to think critically about new features, and to always question the accessibility and usability of a new feature before using it in production.

Put your users front and center, and measure how useful a feature is by how it affects the usability of their interfaces. This is especially true for new features that have a direct impact on the accessibility information of the page.

And how do you know if a feature affects the accessibility information of a page?

Learn more about semantic HTML, and why it is important. Learn more about what makes semantic HTML accessible. Learn more about how ARIA affects HTML, and how it doesn’t! And learn about the proper use of ARIA in HTML.

Then, learn about how CSS can affect accessibility.

And most importantly, learn about your users, and all the diverse ways that they access the Web, and how the code you write affects their experience of the Web.

There’s so much to learn and to be inspired by, and that will make you a better developer.

I know this sounds overwhelming. But I promise you it’s not. Once you understand the foundations of accessibility, these things become second nature,and it becomes easier to spot accessibility issues and to fix most of them (if not all) on the spot.

Feel free to use the knowledge we covered in this post to go over the rest of the examples, inspect their accessibility information, test them using keyboard and a screen reader, and get an idea of how usable they are. Maybe even try to have some fun by imagining how you could improve them and make them more usable. (That can sometimes be by removing features!)

If you want to learn accessibility in-depth and learn how to find and fix accessibility issues by yourself, I have created a comprehensive, structured curriculum in the form of a self-paced video course that is aimed to equip you with the knowledge you need to confidently create more accessible websites and web applications today.

The course is called Practical Accessibility, and you can enroll in it today at practical-accessibility.today.

Sign up for my newsletter to receive more posts like this in your Inbox. 📬

Read the whole story
emrox
4 days ago
reply
Hamburg, Germany
Share this story
Delete

CSS snippets

1 Share

I’ve been thinking about the kind of CSS I write by default when I start a new project.

Some of it is habitual. I now use logical properties automatically. It took me a while to rewire my brain, but now seeing left or top in a style sheet looks wrong to me.

When I mentioned this recently, I had some pushback from people wondering why you’d bother using logical properites if you never planned to translate the website into a language with a different writing system. I pointed out that even if you don’t plan to translate a web page, a user may still choose to. Using logical properties helps them. From that perspective, it’s kind of like using user preference queries.

That’s something else I use by default now. If I’ve got any animations or transitions in my CSS, I wrap them in prefers-reduced-motion: no-preference query.

For instance, I’m a huge fan of view transitions and I enable them by default on every new project, but I do it like this:

@media (prefers-reduced-motion: no-preference) {
  @view-transition {
    navigation: auto;
  }
}

I’ll usually have a prefers-color-scheme query for dark mode too. This is often quite straightforward if I’m using custom properties for colours, something else I’m doing habitually. And now I’m starting to use OKLCH for those colours, even if they start as hexadecimal values.

Custom properties are something else I reach for a lot, though I try to avoid premature optimisation. Generally I wait until I spot a value I’m using more than two or three times in a stylesheet; then I convert it to a custom property.

I make full use of clamp() for text sizing. Sometimes I’ll just set a fluid width on the html element and then size everything else with ems or rems. More often, I’ll use Utopia to flow between different type scales.

Okay, those are all features of CSS—logical properties, preference queries, view transitions, custom properties, fluid type—but what about actual snippets of CSS that I re-use from project to project?

I’m not talking about a CSS reset, which usually involves zeroing out the initial values provided by the browser. I’m talking about tiny little enhancements just one level up from those user-agent styles.

Here’s one I picked up from Eric that I apply to the figcaption element:

figcaption {
  max-inline-size: max-content;
  margin-inline: auto;
}

That will centre-align the text until it wraps onto more than one line, at which point it’s no longer centred. Neat!

Here’s another one I start with on every project:

a:focus-visible {
  outline-offset: 0.25em;
  outline-width: 0.25em;
  outline-color: currentColor;
}

That puts a nice chunky focus ring on links when they’re tabbed to. Personally, I like having the focus ring relative to the font size of the link but I know other people prefer to use a pixel size. You do you. Using the currentColor of the focused is usually a good starting point, thought I might end up over-riding this with a different hightlight colour.

Then there’s typography. Rich has a veritable cornucopia of starting styles you can use to improve typography in CSS.

Something I’m reaching for now is the text-wrap property with its new values of pretty and balance:

ul,ol,dl,dt,dd,p,figure,blockquote {
  hanging-punctuation: first last;
  text-wrap: pretty;
}

And maybe this for headings, if they’re being centred:

h1,h2,h3,h4,h5,h6 {
  text-align: center;
  text-wrap: balance;
}

All of these little snippets should be easily over-writable so I tend to wrap them in a :where() selector to reduce their specificity:

:where(figcaption) {
  max-inline-size: max-content;
  margin-inline: auto;
}
:where(a:focus-visible) {
  outline-offset: 0.25em;
  outline-width: 0.25em;
  outline-color: currentColor;
}
:where(ul,ol,dl,dt,dd,p,figure,blockquote) {
  hanging-punctuation: first last;
  text-wrap: pretty;
}

But if I really want them to be easily over-writable, then the galaxy-brain move would be to put them in their own cascade layer. That’s what Manu does with his CSS boilerplate:

@layer core, third-party, components, utility;

Then I could put those snippets in the core layer, making sure they could be overwritten by the CSS in any of the other layers:

@layer core {
  figcaption {
    max-inline-size: max-content;
    margin-inline: auto;
  }
  a:focus-visible {
    outline-offset: 0.25em;
    outline-width: 0.25em;
    outline-color: currentColor;
  }
  ul,ol,dl,dt,dd,p,figure,blockquote {
    hanging-punctuation: first last;
    text-wrap: pretty;
  }
}

For now I’m just using :where() but I think I should start using cascade layers.

I also want to start training myself to use the lh value (line-height) for block spacing.

And although I’m using the :has() selector, I don’t think I’ve yet trained my brain to reach for it by default.

CSS has sooooo much to offer today—I want to make sure I’m taking full advantage of it.

Read the whole story
emrox
4 days ago
reply
Hamburg, Germany
Share this story
Delete

The 2025 Hive Systems Password Table Is Here - Passwords Are Easier to Crack Than Ever

1 Share
Read the whole story
emrox
5 days ago
reply
Hamburg, Germany
Share this story
Delete

The vocal effects of Daft Punk

2 Shares
Read the whole story
emrox
7 days ago
reply
Hamburg, Germany
Share this story
Delete

I use Zip Bombs to Protect my Server

1 Share

The majority of the traffic on the web is from bots. For the most part, these bots are used to discover new content. These are RSS Feed readers, search engines crawling your content, or nowadays AI bots crawling content to power LLMs. But then there are the malicious bots. These are from spammers, content scrapers or hackers. At my old employer, a bot discovered a wordpress vulnerability and inserted a malicious script into our server. It then turned the machine into a botnet used for DDOS. One of my first websites was yanked off of Google search entirely due to bots generating spam. At some point, I had to find a way to protect myself from these bots. That's when I started using zip bombs.

A zip bomb is a relatively small compressed file that can expand into a very large file that can overwhelm a machine.

A feature that was developed early on the web was compression with gzip. The Internet being slow and information being dense, the idea was to compress data as small as possible before transmitting it through the wire. So an 50 KB HTML file, composed of text, can be compressed to 10K, thus saving you 40KB in transmission. On dial up Internet, this meant downloading the page in 3 seconds instead of 12 seconds.

This same compression can be used to serve CSS, Javascript, or even images. Gzip is fast, simple and drastically improves the browsing experience. When a browser makes a web request, it includes the headers that signals the target server that it can support compression. And if the server also supports it, it will return a compressed version of the expected data.

Accept-Encoding: gzip, deflate

Bots that crawl the web also support this feature. Especially since their job is to ingest data from all over the web, they maximize their bandwidth by using compression. And we can take full advantage of this feature.

On this blog, I often get bots that scan for security vulnerabilities, which I ignore for the most part. But when I detect that they are either trying to inject malicious attacks, or are probing for a response, I return a 200 OK response, and serve them a gzip response. I vary from a 1MB to 10MB file which they are happy to ingest. For the most part, when they do, I never hear from them again. Why? Well, that's because they crash right after ingesting the file.

Content-Encoding: deflate, gzip

What happens is, they receive the file, read the header that instructs them that it is a compressed file. So they try to decompress the 1MB file to find whatever content they are looking for. But the file expands, and expands, and expands, until they run out of memory and their server crashes. The 1MB file decompresses into a 1GB. This is more than enough to break most bots. However, for those pesky scripts that won't stop, I serve them the 10MB file. This one decompresses into 10GB and instantly kills the script.

Before I tell you how to create a zip bomb, I do have to warn you that you can potentially crash and destroy your own device. Continue at your own risk. So here is how we create the zip bomb:

dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz

Here is what the command does:

  1. dd: The dd command is used to copy or convert data.
  2. if: Input file, specifies /dev/zero a special file that produces an infinite stream of zero bytes.
  3. bs: block size, sets the block size to 1 gigabyte (1G), meaning dd will read and write data in chunks of 1 GB at a time.
  4. count=10: This tells dd to process 10 blocks, each 1 GB in size. So, this will generate 10 GB of zeroed data.

We then pass the output of the command to gzip which will compress the output into the file 10GB.gz. The resulting file is 10MB in this case.

On my server, I've added a middleware that checks if the current request is malicious or not. I have a list of black-listed ips that try to scan the whole website repeatedly. I have other heuristics in place to detect spammers. A lot of spammers attempt to spam a page, then come back to see if the spam has made it to the page. I use this pattern to detect them. It looks something like this:

if (ipIsBlackListed() || isMalicious()) {
    header("Content-Encoding: deflate, gzip");
    header("Content-Length: "+ filesize(ZIP_BOMB_FILE_10G)); // 10 MB
    readfile(ZIP_BOMB_FILE_10G);
    exit;
}

That's all it takes. The only price I pay is that I'm serving a 10MB file now on some occasions. If I have an article going viral, I decrease it to the 1MB file, which is just as effective.

One more thing, a zip bomb is not foolproof. It can be easily detected and circumvented. You could partially read the content after all. But for unsophisticated bots that are blindly crawling the web disrupting servers, this is a good enough tool for protecting your server.

You can see it in action in this replay of my server logs.


Read the whole story
emrox
13 days ago
reply
Hamburg, Germany
Share this story
Delete
Next Page of Stories