3575 stories
·
2 followers

Snappy UI Optimization with useDeferredValue

1 Share

useDeferredValue is one of the most underrated React hooks. It allows us to dramatically improve the performance of our applications in certain contexts. I recently used it to solve a gnarly performance problem on this blog, and in this tutorial, I'll show you how! ⚡



Read the whole story
emrox
1 day ago
reply
Hamburg, Germany
Share this story
Delete

Bad Weather

2 Shares


View On WordPress

Read the whole story
emrox
1 day ago
reply
Hamburg, Germany
ameel
11 days ago
reply
Melbourne, Australia
Share this story
Delete

I’m worried about the tabbing behaviour, rather than the syntax and name of CSS masonry

1 Share

Back in 2022 I made this site: Be the browser’s mentor, not its micromanager. There’s some key principles on there which is a nice little collection of tiles.

Dark grey tiles with white text arranged in a standard grid in Arc

The trick during the talk — that I made this site for — was that the grid is actually progressively enhanced with masonry where browsers support it, but no one in the audience would have known that had I not told them. It’s the magic of progressive enhancement: everyone gets a fantastic experience, so they don’t even consider if they are getting the “best” experience. They already are because everything works for them.

Anyway, this is how it looks in Safari Technology Preview. It’s subtle, but it’s a nice enhancement.

Dark grey tiles with white text arranged in a masonry grid in Safari, with a slightly modified source order

The way the layout works is there’s a flexible layout composition, aptly named .grid:

.grid {
  display: grid;
  grid-template-columns: repeat(
    var(--grid-placement, auto-fill),
    minmax(var(--grid-min-item-size, 16rem), 1fr)
  );
  gap: var(--gutter, var(--space-s-l));
}

Using a CUBE exception, I added a masonry enhancement:

.grid[data-rows='masonry'] {
  grid-template-rows: masonry;
  align-items: start;
}

The nice thing about this exception is yes, it slaps the masonry grid-template-rows value in, but also, aligns items to the start, so they at least only size vertically to the size of the content where masonry isn’t available.

The reason I chose this pattern was because I knew there would be no tabbing issues because it’s just headings and paragraphs with one link. I created a little demo to show you the problem with tabbing in the current iteration of masonry, available in Firefox and Safari Technology preview:

See the Pen Tabbing issues with Masonry by Andy Bell (@andy-set-studio) on CodePen.

For those of you without those browsers, here’s what it looks like.

Watch the video.

The tabbing order is wild — especially in Firefox. That’s sorta expected though because masonry layouts pack items into available space to get that stonework-like effect — hence the name masonry.

This is a real problem though because with one line of CSS you can create a pretty serious accessibility issue. I dunno how it would get fixed, so maybe the best thing is for me to warn you not to use masonry if there’s focusable elements in play.

Which opinion on syntax do I have?

WebKit asked for opinions and Google answered, so here’s my opinion. I honestly don’t mind either of their approaches. I was thinking a while ago masonry feels like a flexbox kind of deal because by nature of a masonry layout, it’s flexible, which to me screams flexbox. I am not smart enough for CSS specs though, so I'll take whatever I'm given as long as it works.

Masonry is already available as a grid value though. What do sites that already use that experimental value do? Sure it’s part of a one-liner, but Google’s suggestion certainly isn’t. It’s a whole layout system in itself, which is a hell of a refactor! On the flip-side, I’m sure we’d rather have agreed standards than potentially half-baked ideas.

To be honest, I think masonry as a design pattern is pretty darn antiquated. I liked the example that Jen used in the WebKit post — using masonry to tidy up one of those mega menus — but in reality, unless you’re building Pintrest or Unsplash-like photographic UIs, you can probably do better without masonry anyway, which begs the question: are there better things for the browsers to be focusing on?

Read the whole story
emrox
1 day ago
reply
Hamburg, Germany
Share this story
Delete

GPUs Go Brrr

1 Share

AI uses an awful lot of compute.

In the last few years we’ve focused a great deal of our work on making AI use less compute (e.g. Based, Monarch Mixer, H3, Hyena, S4, among others) and run more efficiently on the compute that we have (e.g. FlashAttention, FlashAttention-2, FlashFFTConv). Lately, reflecting on these questions has prompted us to take a step back, and ask two questions:

  • What does the hardware actually want?
  • And how can we give that to it?

This post is a mixture of practice and philosophy. On the practical side, we’re going to talk about what we’ve learned about making GPUs go brr -- and release an embedded DSL, ThunderKittens, that we’ve built to help us write some particularly speedy kernels (which we are also releasing). On the philosophical side, we’ll briefly talk about how what we’ve learned has changed the way we think about AI compute.

What's in an H100?

For this post, we’re going to focus on the NVIDIA H100 for two reasons. First, it represents an awful lot of new compute going online. Second, we think the trends it implies are going to continue in future generations, and probably from other manufacturers, too. But bear in mind (and we will repeat in case you forget) that most of this post applies in some form to other GPUs, too.

Advance apologies for restating the data sheet, but the details of the hardware are important for the discussion to come. An H100 SXM GPU contains, for our purposes:

  • 80 GB of HBM3 with 3 TB/s of bandwidth. (A bit less bandwidth in practice.)
  • 50 MB of L2 cache with 12 TB/s of bandwidth, split across the GPU into two 25MB sections connected by a crossbar. (The crossbar sucks.)
  • 132 streaming multiprocessors (SM’s), where each has:
    • up to 227 KB of shared memory within a 256 KB L1 cache. (Together, these have about 33 TB/s of bandwidth.)
    • a tensor memory accelerator (TMA) -- a new chunk of hardware in Hopper that can do asynchronous address generation and fetch memory. It also does other things like facilitate the on-chip memory network (distributed shared memory) but we’re not going to focus on this much, today.
    • 4 quadrants, where each quadrant has:
      • A warp scheduler
      • 512 vector registers (each containing 32 4-byte words)
      • A tensor core for matrix multiplies
      • A bunch of built-in instructions like sums, multiplies, that operate in parallel on these vector registers.

There’s a lot of other stuff, too (memory controllers, instruction caches, etc) but we don’t care about any of that right now.

All of the compute happens in the SM’s. Most of it happens in the registers.

Great, how do I make it go brr?

Keep the tensor core fed. That’s it.

Wait, really?

Yes. That’s the game.

An H100 GPU has 989 TFLOPs of half-precision matrix multiply compute, and ~60 TFLOPs of “everything else”. So, every cycle the tensor core is in use, you’re getting at least 94% utilization of the hardware. And every cycle the tensor core is not in use, you’re getting no more than 6% utilization of the hardware. Put another way:

% utilization H100 = % tensor cores active cycles +/- 6%.

Now it turns out that keeping the tensor core fed is easier said than done. We’ve discovered a number of quirks to the hardware that are important to keeping the matrix multiplies rolling. Much of this also applies to non-H100 GPUs, but the H100 is particularly tricky to keep fed so we focus on it here. (The RTX 4090, by comparison, is very easy to work with as illustrated in figure 2.)

  • WGMMA instructions are necessary but also really irritating to use.
  • Shared memory is not actually that fast and also requires great care.
  • Address generation is expensive.
  • Occupancy remains helpful, and registers are generally the key resource.

Let’s go through each of these in order.

WGMMA Instructions

The H100 has a new set of instructions called “warp group matrix multiply accumulate” (wgmma.mma_async in PTX, or HGMMA/IGMMA/QGMMA/BGMMA in SASS). To understand what makes them special, we need to look briefly at how you used to have to use tensor cores. The tensor core instructions available on previous GPUs were wmma.mma.sync and mma.sync instructions. With these instructions a warp of 32 threads on a single quadrant of an SM would synchronously feed their chunk of the data into the tensor core and await the result. Only then could they move on.

Not so with wgmma.mma_async instructions. Here, 128 consecutive threads -- split across all quadrants of the SM -- collaboratively synchronize, and asynchronously launch a matrix multiply directly from shared memory (and optionally also registers.) These warps can then go do other things with their registers while the matrix multiply happens, and await the result whenever they want.

In our microbenchmarks, we found that these instructions are necessary to extract the full compute of the H100. Without them, the GPU seems to top out around 63% of its peak utilization; we suspect this is because the tensor cores want a deep hardware pipeline to keep them fed, even from local resources.

Unfortunately, the memory layouts for these instructions are quite complicated. The unswizzled shared memory layouts suffer from very poor coalescing, and so they require substantial additional bandwidth from L2. The swizzled memory layouts are flat-out incorrectly documented, which took considerable time for us to figure out. They’re also brittle, in that they appear to only work for specific matrix shapes and do not play well with other parts of the wgmma.mma_async instructions. For example, the hardware can transpose sub-matrices on its way to the tensor cores -- but only if the layout is not swizzled.

We’ve also found that unswizzled wgmma layouts have both poor memory coalescing as well as bank conflicts. On kernels such as flash attention, TMA and the L2 cache are both fast enough so as to hide these problems reasonably well. But to make the full use of the hardware, memory request must be coalesced and bank conflicts avoided, and then controlling layouts very carefully becomes critical.

Despite these pains, these instructions really are necessary to make full use of the H100. Without them, you’ve already lost 37% of the potential performance of the GPU!

Shared memory

Shared memory appears to have a single-access latency of around 30 cycles (this matches our observations, too). That doesn’t sound like much, but in that time the SM’s tensor cores could have done almost two full 32x32 square matrix multiplies.

In previous work (like Flash Attention), we’ve focused more on the HBM-SRAM bottleneck. And indeed: this really used to be the bottleneck! But as HBM has gotten faster and the tensor cores continue to grow out of proportion with the rest of the chip, even relatively small latencies like those from shared memory have also become important to either remove or hide.

Shared memory can be tricky to work with because it is “banked” into 32 separate stores of memory. If one is not careful, this can lead to something called “bank conflicts”, where the same memory bank is being asked to simultaneously provide multiple different pieces of memory. This leads to requests being serialized, and in our experience this can disproportionately slow down a kernel -- and the register layouts required by wgmma and mma instructions would naively suffer from these bank conflicts. The solution is to rearrange shared memory with various “swizzling” patterns so as to avoid these conflicts, but it is an important detail to get right.

More generally, we have found it very valuable to avoid movement between registers and shared memory when possible, and otherwise to use the built-in hardware (wgmma and TMA instructions) to do data movement asynchronously when possible. Synchronous movement using the actual warps is a worst-case fallback with the greatest generality.

Address Generation

One interesting quirk of the H100 is that the tensor cores and memory are both fast enough that merely producing the memory addresses to fetch takes a substantial fraction of the resources of the chip. (This is even more the case when complicated interleaved or swizzling patterns are added in.)

NVIDIA appears to understand this, as they have bestowed on us the Tensor Memory Accelerator (or TMA, as it likes to be called). TMA allows you to specify a multi-dimensional tensor layout in global and shared memory, tell it to asynchronously fetch a subtile of that tensor, and trip a barrier when it’s done. This saves all of the address generation costs, and additionally makes it much easier to construct pipelines.

We have found TMA to be, like wgmma.mma_async, completely indispensable in achieving the full potential of the H100. (Probably moreso than wgmma, in our experience.) It saves register resources and instruction dispatches, and also has useful features such as the ability to perform reductions onto global memory asynchronously, too -- this is particularly useful in complex backwards kernels. As with wgmma, the main quirk of it is that its swizzling modes are a bit difficult to decipher without some reverse engineering, but we had substantially less pain on this point.

Occupancy

For those newer to CUDA, occupancy refers to the number of co-scheduled threads on the exact same execution hardware. Each cycle, the warp scheduler on that quadrant of the SM will try to issue an instruction to a warp of threads that are ready for an instruction. NVIDIA uses this model because it can enable the hardware to be more easily kept full. For example, while one warp of threads is waiting for a matrix multiply, another can receive an instruction to use the fast exponential hardware.

In some ways, the H100 is less reliant on occupancy than previous generations of the hardware. The asynchronous features of the chip mean that even a single instruction stream can keep many parts of the hardware busy -- fetching memory, running matrix multiplies, doing shared memory reductions, and still simultaneously running math on the registers.

But occupancy is very good at hiding both sins and sync’s. A perfectly designed pipeline might run reasonably fast even without any additional occupancy, but our observations suggest that NVIDIA really has designed their GPUs with occupancy in mind. And there are enough synchronizations -- and enough ways to make mistakes -- that finding ways to increase occupancy has, in our experience, usually yielded good returns at increasing the realized utilization of the hardware.

Finally, while occupancy is merely useful on the H100, we have found it to be increasingly important on the A100 and RTX 4090, respectively, likely because they rely increasingly on synchronous instruction dispatches, relative to the H100.

ThunderKittens

Based on the above, we asked ourselves how we might make it easier to write the kinds of kernels we care about while still extracting the full capabilities of the hardware. Motivated by a continuing proliferation of new architectures within the lab (and the fact that Flash Attention is like 1200 lines of code), we ended up designing a DSL embedded within CUDA -- at first for our own internal use.

But then we decided it was useful enough that, with love in our hearts, we cleaned it up and have released it for you. ThunderKittens is that embedded DSL. It is named ThunderKittens because we think kittens are cute, and also we think it is funny to make you type kittens:: in your code.

It is meant to be as simple as possible, and contains four templated types:

  • Register tiles -- 2D tensors on the register file.
  • Register vectors -- 1D tensors on the register file.
  • Shared tiles -- 2D tensors in shared memory.
  • Shared vectors -- 1D tensors in shared memory.

Tiles are parameterized by a height, width, and layout. Register vectors are parameterized by a length and a layout, and shared vectors just by a length. (They don’t generally suffer from bank conflicts.)

We also give operations to manipulate them, either at the warp level or at the level of a collaborative group of warps. Examples include:

  • Initializers -- zero out a shared vector, for example.
  • Unary ops, like exp
  • Binary ops, like mul
  • Row / column ops, like a row_sum

Since ThunderKittens is embedded within CUDA (contrasting libraries like Triton which we also love very much and rely on heavily), the abstractions fail gracefully. If it’s missing something, just extend it to do what you want!

To show an example of these primitives in action, consider Tri’s lovely flash attention -- a beautiful algorithm, but complicated to implement in practice, even on top of NVIDIA’s wonderful Cutlass library.

Here's a simple forward flash attention kernel for an RTX 4090, written in ThunderKittens.

Altogether, this is about 60 lines of CUDA sitting at 75% hardware utilization -- and while it is fairly dense, most of the complexity is in the algorithm, rather than in swizzling patterns or register layouts. And what of all of the complexity of TMA, WGMMA, swizzling modes, and descriptors? Here’s a FlashAttention-2 forward pass for the H100, written with ThunderKittens.

So how does it do?

This kernel is just 100 lines, and it actually outperforms FlashAttention-2 on the H100 by about 30%. ThunderKittens takes care of wrapping up the layouts and instructions, and gives you a mini-pytorch to play with on the GPU.

We also release kernels for Based linear attention and other forthcoming architectures, too. Our Based linear attention kernel runs at 215 TFLOPs (or more than 300 TFLOPs when the recompute inherent in the algorithm is considered). And while linear attention is of course theoretically more efficient, historically, they have been dramatically less efficient on real hardware. So we feel this could open up a broad range of high-throughput applications -- more to come on this point later.

If this seems up your alley, feel free to play with it!

Tiles Seem Like a Good Idea

In our view, what has made ThunderKittens work well for us is that it does not try to do everything. CUDA is indeed far more expressive than ThunderKittens. ThunderKittens is small and dumb and simple.

But ThunderKittens has good abstractions -- small tiles -- that match where both AI and hardware are going. ThunderKittens doesn’t support any dimension less than 16. But in our view, this doesn’t really matter, since the hardware doesn’t particularly want to, either. And we ask: if your matrix multiply is smaller than 16x16, are you sure what you’re doing is AI?

From a philosophical point of view, we think a frame shift is in order. A “register” certainly shouldn’t be a 32-bit word like on the CPUs of old. And a 1024-bit wide vector register, as CUDA uses, is certainly a step in the right direction. But to us a “register” is a 16x16 tile of data. We think AI wants this -- after all this time, it’s still just matrix multiplies, reductions, and reshapes. And we think the hardware wants this, too -- small matrix multiplies are just begging for hardware support beyond just the systolic mma.

In fact, more broadly we believe we should really reorient our ideas of AI around what maps well onto the hardware. How big should a recurrent state be? As big can fit onto an SM. How dense should the compute be? No less so than what the hardware demands. An important future direction of this work for us is to use our learnings about the hardware to help us design the AI to match.

Tiles Seem Pretty General

Coming soon -- ThunderKittens on AMD hardware!

Read the whole story
emrox
1 day ago
reply
Hamburg, Germany
Share this story
Delete

The Modern Guide For Making CSS Shapes

2 Shares

You have for sure googled “how to create [shape_name] with CSS” at least once in your front-end career if it’s not something you already have bookmarked. And the number of articles and demos you will find out there is endless.

Good, right? Copy that code and drop it into the ol’ stylesheet. Ship it!

The problem is that you don’t understand how the copied code works. Sure, it got the job done, but many of the most widely used CSS shape snippets are often dated and rely on things like magic numbers to get the shapes just right. So, the next time you go into the code needing to make a change to it, it either makes little sense or is inflexible to the point that you need an entirely new solution.

So, here it is, your one-stop modern guide for how to create shapes in CSS! We are going to explore the most common CSS shapes while highlighting different CSS tricks and techniques that you can easily re-purpose for any kind of shape. The goal is not to learn how to create specific shapes but rather to understand the modern tricks that allow you to create any kind of shape you want.

Table of Contents

You can jump directly to the topic you’re interested in to find relevant shapes or browse the complete list. Enjoy!

Why Not SVG?

I get asked this question often, and my answer is always the same: Use SVG if you can! I have nothing against SVG. It’s just another approach for creating shapes using another syntax with another set of considerations. If SVG was my expertise, then I would be writing about that instead!

CSS is my field of expertise, so that’s the approach we’re covering for drawing shapes with code. Choosing CSS or SVG is typically a matter of choice. There may very well be a good reason why SVG is a better fit for your specific needs.

Many times, CSS will be your best bet for decorative things or when you’re working with a specific element in the markup that contains real content to be styled. Ultimately, though, you will need to consider what your project’s requirements are and decide whether a CSS shape is really what you are looking for.

Your First Resource

Before we start digging into code, please spend a few minutes over at my CSS Shape website. You will find many examples of CSS-only shapes. This is an ever-growing collection that I regularly maintain with new shapes and techniques. Bookmark it and use it as a reference as we make our way through this guide.

Is it fairly easy to modify and tweak the CSS for those shapes?

Yes! The CSS for each and every shape is optimized to be as flexible and efficient as possible. The CSS typically targets a single HTML element to prevent you from having to touch too much markup besides dropping the element on the page. Additionally, I make liberal use of CSS variables that allow you to modify things easily for your needs.

Most of you don't have time to grasp all the techniques and tricks to create different shapes, so an online resource with ready-to-use snippets of code can be a lifesaver!

Clipping Shapes In CSS

The CSS clip-path property — and its polygon() function — is what we commonly reach for when creating CSS Shapes. Through the creation of common CSS shapes, we will learn a few tricks that can help you create other shapes easily.

Hexagons

Let’s start with one of the easiest shapes; the hexagon. We first define the shape’s dimensions, then provide the coordinates for the six points and we are done.

.hexagon {
  width: 200px;
  aspect-ratio: 0.866; 
  clip-path: polygon(
    0% 25%,
    0% 75%,
    50% 100%, 
    100% 75%, 
    100% 25%, 
    50% 0%);
}

We’re basically drawing the shape of a diamond where two of the points are set way outside the bounds of the hexagon we’re trying to make. This is perhaps the very first lesson for drawing CSS shapes: Allow yourself to think outside the box — or at least the shape’s boundaries.

Look how much simpler the code already looks:

.hexagon {
  width: 200px;
  aspect-ratio: cos(30deg); 
  clip-path: polygon(
    -50% 50%,
    50% 100%,
    150% 50%,
    50% 0
  );
}

Did you notice that I updated the aspect-ratio property in there? I’m using a trigonometric function, cos(), to replace the magic number 0.866. The exact value of the ratio is equal to cos(30deg) (or sin(60deg)). Besides, cos(30deg) is a lot easier to remember than 0.866.

Here’s something fun we can do: swap the X and Y coordinate values. In other words, let’s change the polygon() coordinates from this pattern:

clip-path: polygon(X1 Y1, X2 Y2, ..., Xn Yn)

…to this, where the Y values come before the X values:

clip-path: polygon(Y1 X1, Y2 X2, ..., Yn Xn)

What we get is a new variation of the hexagon:

I know that visualizing the shape with outside points can be somewhat difficult because we’re practically turning the concept of clipping on its head. But with some practice, you get used to this mental model and develop muscle memory for it.

Notice that the CSS is remarkably similar to what we used to create a hexagon:

.octagon {
  width: 200px;  
  aspect-ratio: 1;  
  --o: calc(50% * tan(-22.5deg));
  clip-path: polygon(
    var(--o) 50%,
    50% var(--o),
    calc(100% - var(--o)) 50%,
    50% calc(100% - var(--o))
  );
}

Except for the small trigonometric formula, the structure of the code is identical to the last hexagon shape — set the shape’s dimensions, then clip the points. And notice how I saved the math calculation as a CSS variable to avoid repeating that code.

If math isn’t really your thing — and that’s totally fine! — remember that the formulas are simply one part of the puzzle. There’s no need to go back to your high school geometry textbooks. You can always find the formulas you need for specific shapes in my online collection. Again, that collection is your first resource for creating CSS shapes!

And, of course, we can apply this shape to an <img> element as easily as we can a <div>:

It may sound impossible to make a star out of only five points, but it’s perfectly possible, and the trick is how the points inside polygon() are ordered. If we were to draw a star with pencil on paper in a single continuous line, we would follow the following order:

It’s the same way we used to draw stars as kids — and it fits perfectly in CSS with polygon()! This is another hidden trick about clip-path with polygon(), and it leads to another key lesson for drawing CSS shapes: the lines we establish can intersect. Again, we’re sort of turning a concept on its head, even if it’s a pattern we all grew up making by hand.

Here’s how those five points translate to CSS:

.star {
  width: 200px;
aspect-ratio: 1; clip-path: polygon(50% 0, /* (1) */ calc(50%*(1 + sin(.4turn))) calc(50%*(1 - cos(.4turn))), /* (2) */ calc(50%*(1 - sin(.2turn))) calc(50%*(1 - cos(.2turn))), /* (3) */ calc(50%*(1 + sin(.2turn))) calc(50%*(1 - cos(.2turn))), /* (4) */ calc(50%*(1 - sin(.4turn))) calc(50%*(1 - cos(.4turn))) /* (5) */ ); }

The funny thing is that starbursts are basically the exact same thing as polygons, just with half the points that we can move inward.

Figure 6.

I often advise people to use my online generators for shapes like these because the clip-path coordinates can get tricky to write and calculate by hand.

That said, I really believe it’s still a very good idea to understand how the coordinates are calculated and how they affect the overall shape. I have an entire article on the topic for you to learn the nuances of calculating coordinates.

Parallelograms & Trapezoids

Another common shape we always build is a rectangle shape where we have one or two slanted sides. They have a lot of names depending on the final result (e.g., parallelogram, trapezoid, skewed rectangle, and so on), but all of them are built using the same CSS technique.

First, we start by creating a basic rectangle by linking the four corner points together:

clip-path: polygon(0 0, 100% 0, 100% 100%, 0 100%)

This code produces nothing because our element is already a rectangle. Also, note that 0 and 100% are the only values we’re using.

Next, offset some values to get the shape you want. Let’s say our offset needs to be equal to 10px. If the value is 0, we update it with 10px, and if it’s 100% we update it with calc(100% - 10px). As simple as that!

But which value do I need to update and when?

Try and see! Open your browser’s developer tools and update the values in real-time to see how the shape changes, and you will understand what points you need to update. I would lie if I told you that I write all the shapes from memory without making any mistakes. In most cases, I start with the basic rectangle, and I add or update points until I get the shape I want. Try this as a small homework exercise and create the shapes in Figure 11 by yourself. You can still find all the correct code in my online collection for reference.

If you want more CSS tricks around the clip-path property, check my article “CSS Tricks To Master The clip-path Property” which is a good follow-up to this section.

Masking Shapes In CSS

We just worked with a number of shapes that required us to figure out a number of points and clip-path by plotting their coordinates in a polygon(). In this section, we will cover circular and curvy shapes while introducing the other property you will use the most when creating CSS shapes: the mask property.

Like the previous section, we will create some shapes while highlighting the main tricks you need to know. Don’t forget that the goal is not to learn how to create specific shapes but to learn the tricks that allow you to create any kind of shape.

Circles & Holes

When talking about the mask property, gradients are certain to come up. We can, for example, “cut” (but really “mask”) a circular hole out of an element with a radial-gradient:

mask: radial-gradient(50px, #0000 98%, #000);

Why aren’t we using a simple background instead? The mask property allows us more flexibility, like using any color we want and applying the effect on a variety of other elements, such as <img>. If the color and flexible utility aren’t a big deal, then you can certainly reach for the background property instead of cutting a hole.

Here’s the mask working on both a <div> and <img>:

Once again, it’s all about CSS masks and gradients. In the following articles, I provide you with examples and recipes for many different possibilities:

Be sure to make it to the end of the second article to see how this technique can be used as decorative background patterns.

This time, we are going to introduce another technique which is “composition”. It’s an operation we perform between two gradient layers. We either use mask-composite to define it, or we declare the values on the mask property.

The figure below illustrates the gradient configuration and the composition between each layer.

We start with a radial-gradient to create a full circle shape. Then we use a conic-gradient to create the shape below it. Between the two gradients, we perform an “intersect” composition to get the unclosed circle. Then we tack on two more radial gradients to the mask to get those nice rounded endpoints on the unclosed circle. This time we consider the default composition, “add”.

Gradients aren’t something new as we use them a lot with the background property but “composition” is the new concept I want you to keep in mind. It’s a very handy one that unlocks a lot of possibilities.

Ready for the CSS?

.arc {
  --b: 40px; /* border thickness */
  --a: 240deg; /* progression */
--_g:/var(--b) var(--b) radial-gradient(50% 50%,#000 98%,#0000) no-repeat; mask: top var(--_g), calc(50% + 50% * sin(var(--a))) calc(50% - 50% * cos(var(--a))) var(--_g), conic-gradient(#000 var(--a), #0000 0) intersect, radial-gradient(50% 50%, #0000 calc(100% - var(--b)), #000 0 98%, #0000) }

We could get clever and use a pseudo-element for the shape that’s positioned behind the set of panels, but that introduces more complexity and fixed values than we ought to have. Instead, we can continue using CSS masks to get the perfect shape with a minimal amount of reusable code.

It’s not really the rounded top edges that are difficult to pull off, but the bottom portion that curves inwards instead of rounding in like the top. And even then, we already know the secret sauce: using CSS masks by combining gradients that reveal just the parts we want.

We start by adding a border around the element — excluding the bottom edge — and applying a border-radius on the top-left and top-right corners.

.tab {
  --r: 40px; /* radius size */

  border: var(--r) solid #0000; /* transparent black */
  border-bottom: 0;
  border-radius: calc(2 * var(--r)) calc(2 * var(--r)) 0 0;
}

Next, we add the first mask layer. We only want to show the padding area (i.e., the red area highlighted in Figure 10).

mask: linear-gradient(#000 0 0) padding-box;

Let’s add two more gradients, both radial, to show those bottom curves.

mask: 
  radial-gradient(100% 100% at 0 0, #0000 98%, #000) 0 100% / var(--r) var(--r), 
  radial-gradient(100% 100% at 100% 0, #0000 98%, #000) 100% 100% / var(--r) var(--r), 
  linear-gradient(#000 0 0) padding-box;

Here is how the full code comes together:

.tab {
  --r: 40px; /* control the radius */

  border: var(--r) solid #0000;
  border-bottom: 0;
  border-radius: calc(2 * var(--r)) calc(2 * var(--r)) 0 0;
  mask: 
    radial-gradient(100% 100% at 0 0, #0000 98%, #000) 0 100% / var(--r) var(--r), 
    radial-gradient(100% 100% at 100% 0, #0000 98%, #000) 100% 100% / var(--r) var(--r), 
    linear-gradient(#000 0 0) padding-box;
  mask-repeat: no-repeat;
  background: linear-gradient(60deg, #BD5532, #601848) border-box;
}

As usual, all it takes is one variable to control the shape. Let’s zero-in on the border-radius declaration for a moment:

border-radius: calc(2 * var(--r)) calc(2 * var(--r)) 0 0;

Notice that the shape’s rounded top edges are equal to two times the radius (--r) value. If you’re wondering why we need a calculation here at all, it’s because we have a transparent border hanging out there, and we need to double the radius to account for it. The radius of the blue areas highlighted in Figure 13 is equal to 2 * R while the red area highlighted in the same figure is equal to 2 * R - R, or simply R.

We can actually optimize the code so that we only need two gradients — one linear and one radial — instead of three. I’ll drop that into the following demo for you to pick apart. Can you figure out how we were able to eliminate one of the gradients?

I’ll throw in two additional variations for you to investigate:

These aren’t tabs at all but tooltips! We can absolutely use the exact same masking technique we used to create the tabs for these shapes. Notice how the curves that go inward are consistent in each shape, no matter if they are positioned on the left, right, or both.

You can always find the code over at my online collection if you want to reference it.

More CSS Shapes

At this point, we’ve seen the main tricks to create CSS shapes. You will rely on mask and gradients if you have curves and rounded parts or clip-path when there are no curves. It sounds simple but there’s still more to learn, so I am going to provide a few more common shapes for you to explore.

Instead of going into a detailed explanation of the shapes in this section, I’m going to give you the recipes for how to make them and all of the ingredients you need to make it happen. In fact, I have written other articles that are directly related to everything we are about to cover and will link them up so that you have guides you can reference in your work.

Triangles

A triangle is likely the first shape that you will ever need. They’re used in lots of places, from play buttons for videos, to decorative icons in links, to active state indicators, to open/close toggles in accordions, to… the list goes on.

Creating a triangle shape is as simple as using a 3-point polygon in addition to defining the size:

.triangle {
  width: 200px;
  aspect-ratio: 1;
  clip-path: polygon(50% 0, 100% 100%, 0 100%);
}

But we can get even further by adding more points to have border-only variations:

We can cut all the corners or just specific ones. We can make circular cuts or sharp ones. We can even create an outline of the overall shape. Take a look at my online generator to play with the code, and check out my full article on the topic where I am detailing all the different cases.

Section Dividers

Speaking of visual transitions between sections, what if both sections have decorative borders that fit together like a puzzle?

I hope you see the pattern now: sometimes, we’re clipping an element or masking portions of it. The fact that we can sort of “carve” into things this way using polygon() coordinates and gradients opens up so many possibilities that would have required clever workarounds and super-specific code in years past.

See my article “How to Create a Section Divider Using CSS” on the freeCodeCamp blog for a deep dive into the concepts, which we’ve also covered here quite extensively already in earlier sections.

Floral Shapes

We’ve created circles. We’ve made wave shapes. Let’s combine those two ideas together to create floral shapes.

These shapes are pretty cool on their own. But like a few of the other shapes we’ve covered, this one works extremely well with images. If you need something fancier than the typical box, then masking the edges can come off like a custom-framed photo.

Here is a demo where I am using such shapes to create a fancy hover effect:

See the Pen Fancy Pop Out hover effect! by Temani Afif.

There’s a lot of math involved with this, specifically trigonometric functions. I have a two-part series that gets into the weeds if you’re interested in that side of things:

As always, remember that my online collection is your Number One resource for all things related to CSS shapes. The math has already been worked out for your convenience, but you also have the references you need to understand how it works under the hood.

Conclusion

I hope you see CSS Shapes differently now as a result of reading this comprehensive guide. We covered a few shapes, but really, it’s hundreds upon hundreds of shapes because you see how flexible they are to configure into a slew of variations.

At the end of the day, all of the shapes use some combination of different CSS concepts such as clipping, masking, composition, gradients, CSS variables, and so on. Not to mention a few hidden tricks like the one related to the polygon() function:

  • It accepts points outside the [0% 100%] range.
  • Switching axes is a solid approach for creating shape variations.
  • The lines we establish can intersect.

It’s not that many things, right? We looked at each of these in great detail and then whipped through the shapes to demonstrate how the concepts come together. It’s not so much about memorizing snippets than it is thoroughly understanding how CSS works and leveraging its features to produce any number of things, like shapes.

Don’t forget to bookmark my CSS Shape website and use it as a reference as well as a quick stop to get a specific shape you need for a project. I avoid re-inventing the wheel in my work, and the online collection is your wheel for snagging shapes made with pure CSS.

Please also use it as inspiration for your own shape-shifting experiments. And post a comment if you think of a shape that would be a nice addition to the collection.

References



Read the whole story
emrox
1 day ago
reply
Hamburg, Germany
alvinashcraft
4 days ago
reply
West Grove, PA
Share this story
Delete

Hardest Problem in Computer Science: Centering Things

2 Shares
Read the whole story
emrox
1 day ago
reply
Hamburg, Germany
Share this story
Delete
Next Page of Stories