3363 stories
·
2 followers

SQL join flavors

1 Share
Read the whole story
emrox
1 day ago
reply
Hamburg, Germany
Share this story
Delete

Highlights from Git 2.42

1 Share

The open source Git project just released Git 2.42 with features and bug fixes from over 78 contributors, 17 of them new. We last caught up with you on the latest in Git back when 2.41 was released.

To celebrate this most recent release, here’s GitHub’s look at some of the most interesting features and changes introduced since last time.

Faster object traversals with bitmaps

Many long-time readers of these blog posts will recall our coverage of reachability bitmaps. Most notably, we covered Git’s new multi-pack reachability bitmaps back in our coverage of the 2.34 release towards the end of 2021.

If this is your first time here, or you need a refresher on reachability bitmaps, don’t worry. Reachability bitmaps allow Git to quickly determine the result set of a reachability query, like when serving fetches or clones. Git stores a collection of bitmaps for a handful of commits. Each bit position is tied to a specific object, and the value of that bit indicates whether or not it is reachable from the given commit.

This often allows Git to compute the answers to reachability queries using bitmaps much more quickly than without, particularly for large repositories. For instance, if you want to know the set of objects unique to some branch relative to another, you can build up a bitmap for each endpoint (in this case, the branch we’re interested in, along with main), and compute the AND NOT between them. The resulting bitmap has bits set to “1” for exactly the set of objects unique to one side of the reachability query.

But what happens if one side doesn’t have bitmap coverage, or if the branch has moved on since the last time it was covered with a bitmap?

In previous versions of Git, the answer was that Git would build up a complete bitmap for all reachability tips relative to the query. It does so by walking backwards from each tip, assembling its own bitmap, and then stopping as soon as it finds an existing bitmap in history. Here’s an example of the existing traversal routine:

Figure 1: Bitmap-based traversal computing the set of objects unique to `main` in Git 2.41.0.

There’s a lot going on here, but let’s break it down. Above we have a commit graph, with five branches and one tag. Each of the commits are indicated by circles, and the references are indicated by squares pointing at their respective referents. Existing bitmaps can be found for both the v2.42.0 tag, and the branch bar.

In the above, we’re trying to compute the set of objects which are reachable from main, but aren’t reachable from any other branch. By inspection, it’s clear that the answer is {C₆, C₇}, but let’s step through how Git would arrive at the same result:

  • For each branch that we want to exclude from the result set (in this case, foo, bar, baz, and quux), we walk along the commit graph, marking each of the corresponding bits in our have‘s bitmap in the top-left.
  • If we happen to hit a portion of the graph that we’ve covered already, we can stop early. Likewise, if we find an existing bitmap (like what happens when we try to walk beginning at branch bar), we can OR in the bits from that commit’s bitmap into our have‘s set, and move on to the next branch.
  • Then, we repeat the same process for each branch we do want to keep (in this case, just main), this time marking or ORing bits into the have‘s bitmap.
  • Finally, once we have a complete bitmap representing each side of the reachability query, we can compute the result by AND NOTing the two bitmaps together, leaving us with the set of objects unique to main.

We can see that in the above, having existing bitmap coverage (as is the case with branch bar) is extremely beneficial, since they allow us to discover the set of objects reachable from a certain point in the graph immediately without having to open up and parse objects.

But what happens when bitmap coverage is sparse? In that case, we end up having to walk over many objects in order to find an existing bitmap. Oftentimes, the additional overhead of maintaining a series of bitmaps outweighs the benefits of using them in the first place, particularly when coverage is poor.

In this release, Git introduces a new variant of the bitmap traversal algorithm that often out performs the existing implementation, particularly when bitmap coverage is sparse.

The new algorithm represents the unwanted side of the reachability query as a bitmap from the query’s boundary, instead of the union of bitmap(s) from the individual tips on the unwanted side. The exact definition of what a query boundary is is slightly technical, but for our purposes you can think of it as the first commit in the wanted set of objects which is also reachable from at least one unwanted object.

In the above example, this is commit C₅, which is reachable from both main (which is in the wanted half of the reachability query) along with bar and baz (both of which are in the unwanted half). Let’s step through computing the same result using the boundary-based approach:

Figure 2: The same traversal as above, instead using the boundary commit-based approach.

The approach here is similar to the above, but not quite the same. Here’s the process:

  • We first discover the boundary commit(s), in this case C₅.
  • We then walk backwards from the set of boundary commit(s) we just discovered until we find a reachability bitmap (or reach the beginning of history). At each stage along the walk, we mark the corresponding bit in the have‘s bitmap.
  • Then, we build up a complete bitmap on the want‘s side by starting a walk from main until either we hit an existing bitmap, the beginning of history, or an object marked in the previous step.
  • Finally, as before, we compute the AND NOT between the two bitmaps, and return the results.

When there are bitmaps close to the boundary commit(s), or the unwanted half of the query is large, this algorithm often vastly outperforms the existing traversal. In the toy example above, you can see we compute the answer much more quickly when using the boundary-based approach. But in real-world examples, between a 2- and 15-fold improvement can be observed between the two algorithms.

You can try out the new algorithm by running:

$ git repack -ad --write-bitmap-index
$ git config pack.useBitmapBoundaryTraversal true

in your repository (using Git 2.42), and then using git rev-list with the --use-bitmap-index flag.

[source]

Exclude references by pattern in for-each-ref

If you’ve ever scripted around Git before, you are likely familiar with its for-each-ref command. If not, you likely won’t be surprised to learn that this command is used to enumerate references in your repository, like so:

$ git for-each-ref --sort='-*committerdate' refs/tags
264b9b3b04610cb4c25e01c78d9a022c2e2cdf19 tag    refs/tags/v2.42.0-rc2
570f1f74dee662d204b82407c99dcb0889e54117 tag    refs/tags/v2.42.0-rc1
e8f04c21fdad4551047395d0b5ff997c67aedd90 tag    refs/tags/v2.42.0-rc0
32d03a12c77c1c6e0bbd3f3cfe7f7c7deaf1dc5e tag    refs/tags/v2.41.0
[...]

for-each-ref is extremely useful for listing references, finding which references point at a given object (with --points-at), which references have been merged into a given branch (with --merged), or which references contain a given commit (with --contains).

Git relies on the same machinery used by for-each-ref across many different components, including the reference advertisement phase of pushes. During a push, the Git server first advertises a list of references that it wants the client to know about, and the client can then exclude those objects (and anything reachable from them) from the packfile they generate during the push.

Suppose that you have some references that you don’t want to advertise to clients during a push? For example, GitHub maintains a pair of references for each open pull request, like refs/pull/NNN/head and refs/pull/NNN/merge, which aren’t advertised to pushers. Luckily, Git has a mechanism that allows server operators to exclude groups of references from the push advertisement phase by configuring the transfer.hideRefs variable.

Git implements the functionality configured by transfer.hideRefs by enumerating all references, and then inspecting each one to see whether or not it should advertise that reference to pushers. Here’s a toy example of a similar process:

Figure 3: Running `for-each-ref` while excluding the `refs/pull/` hierarchy.

Here, we want to list every reference that doesn’t begin with refs/pull/. In order to do that, Git enumerates each reference one-by-one, and performs a prefix comparison to determine whether or not to include it in the set.

For repositories that have a small number of hidden references, this isn’t such a big deal. But what if you have thousands, tens of thousands, or even more hidden references? Performing that many prefix comparisons only to throw out a reference as hidden can easily become costly.

In Git 2.42, there is a new mechanism to more efficiently exclude references. Instead of inspecting each reference one-by-one, Git first locates the start and end of each excluded region in its packed-refs file. Once it has this information, it creates a jump list allowing it to skip over whole regions of excluded references in a single step, rather than discarding them one by one, like so:

Figure 4: The same `for-each-ref` invocation as above, this time using a jump list as in Git 2.42.

Like the previous example, we still want to discard all of the refs/pull references from the result set. To do so, Git finds the first reference beginning with refs/pull (if one exists), and then performs a modified binary search to find the location of the first reference after all of the ones beginning with refs/pull.

It can then use this information (indicated by the dotted yellow arrow) to avoid looking at the refs/pull hierarchy entirely, providing a measurable speed-up over inspecting and discarding each hidden reference individually.

In Git 2.42, you can try out this new functionality with git for-each-ref‘s new --exclude option. This release also uses this new mechanism to improve the reference advertisement above, as well as analogous components for fetching. In extreme examples, this can provide a 20-fold improvement in the CPU cost of advertising references during a push.

Git 2.42 also comes with a pair of new options in the git pack-refs command, which is responsible for updating the packed-refs file with any new loose references that aren’t stored. In certain scenarios (such as a reference being frequently updated or deleted), it can be useful to exclude those references from ever entering the packed-refs file in the first place.

git pack-refs now understands how to tweak the set of references it packs using its new --include and --exclude flags.

[source, source]

Preserving precious objects from garbage collection

In our last set of release highlights, we talked about a new mechanism for collecting unreachable objects in Git known as cruft packs. Git uses cruft packs to collect and track the age of unreachable objects in your repository, gradually letting them age out before eventually being pruned from your repository.

But Git doesn’t simply delete every unreachable object (unless you tell it to with --prune=now). Instead, it will delete every object except those that meet one of the below criteria:

  1. The object is reachable, in which case it cannot be deleted ever.
  2. The object is unreachable, but was modified after the pruning cutoff.
  3. The object is unreachable, and hasn’t been modified since the pruning cutoff, but is reachable via some other unreachable object which has been modified recently.

But what do you do if you want to hold onto an object (or many objects) which are both unreachable and haven’t been modified since the pruning cutoff?

Historically, the only answer to this question was that you should point a reference at those object(s). That works if you have a relatively small set of objects you want to hold on to. But what if you have more precious objects than you could feasibly keep track of with references?

Git 2.42 introduces a new mechanism to preserve unreachable objects, regardless of whether or not they have been modified recently. Using the new gc.recentObjectsHook configuration, you can configure external program(s) that Git will run any time it is about to perform a pruning garbage collection. Each configured program is allowed to print out a line-delimited sequence of object IDs, each of which is immune to pruning, regardless of its age.

Even if you haven’t started using cruft packs yet, this new configuration option works even when using loose objects to hold unreachable objects which have not yet aged out of your repository.

This makes it possible to store a potentially large set of unreachable objects which you want to retain in your repository indefinitely using an external mechanism, like a SQLite database. To try out this new feature for yourself, you can run:

$ git config gc.recentObjectsHook /path/to/your/program
$ git gc --prune=<approxidate>

[source, source]


  • If you’ve read these blog posts before, you may recall our coverage of the sparse index feature, which allows you to check out a narrow cone of your repository instead of the whole thing.

    Over time, many commands have gained support for working with the sparse index. For commands that lacked support for the sparse index, invoking those commands would cause your repository to expand the index to cover the entire repository, which can be a potentially expensive operation.

    This release, the diff-tree command joined the group of commands with full support for the sparse index, meaning that you can now use diff-tree without expanding your index.

    This work was contributed by Shuqi Liang, one of the Git project’s Google Summer of Code (GSoC) students. You can read more about their project here, and follow along with their progress on their blog.

    [source]

  • If you’ve gotten this far in the blog post and thought that we were done talking about git for-each-ref, think again! This release enhances for-each-ref‘s --format option with a handful of new ways to format a reference.

    The first set of new options enables for-each-ref to show a handful of GPG-related information about commits at reference tips. You can ask for the GPG signature directly, or individual components of it, like its grade, the signer, key, fingerprint, and so on. For example,

    $ git for-each-ref --format='%(refname) %(signature:key)' \
        --sort=v:refname 'refs/remotes/origin/release-*' | tac
    refs/remotes/origin/release-3.1 4AEE18F83AFDEB23
    refs/remotes/origin/release-3.0 4AEE18F83AFDEB23
    refs/remotes/origin/release-2.13 4AEE18F83AFDEB23
    [...]
    

    This work was contributed by Kousik Sanagavarapu, another GSoC student working on Git! You can read more about their project here, and keep up to date with their work on their blog.

    [source, source]

  • Earlier in this post, we talked about git rev-list, a low-level utility for listing the set of objects contained in some query.

    In our early examples, we discussed a straightforward case of listing objects unique to one branch. But git rev-list supports much more complex modifiers, like --branches, --tags, --remotes, and more.

    In addition to specifying modifiers like these on the command-line, git rev-list has a --stdin mode which allows for reading a line-delimited sequence of commits (optionally prefixed with ^, indicating objects reachable from those commit(s) should be excluded) from the command’s standard input.

    Previously, support for --stdin extended only to referring to commits by their object ID, without support for more complex modifiers like the ones listed earlier. In Git 2.42, git rev-list --stdin can now accept the same set of modifiers given on the command line, making it much more useful when scripting.

    [source]

  • Picture this: you’re working away on your repository, typing up a tag message for a tag named foo. Suppose that in the background, you have some repeating task that fetches new commits from your remote repository. If you happen to fetch a tag foo/bar while writing the tag message for foo, Git will complain that you cannot have both tag foo and foo/bar.

    OK, so far so good: Git does not support this kind of tag hierarchy1. But what happened to your tag message? In previous versions of Git, you’d be out of luck, since your in-progress message at $GIT_DIR/TAG_EDITMSG is deleted before the error is displayed. In Git 2.42, Git delays deleting the TAG_EDITMSG until after the tag is successfully written, allowing you to recover your work later on.

    [source]

  • In other git tag-related news, this release comes with a fix for a subtle bug that appeared when listing tags. git tag can list existing tags with the -l option (or when invoked with no arguments). You can further refine those results to only show tags which point at a given object with the --points-at option.

    But what if you have one or more tags that point at the given object through one or more other tags instead of directly? Previous versions of Git would fail to report those tags. Git 2.42 addresses this by dereferencing tags through multiple layers before determining whether or not it points to a given object.

    [source]

  • Finally, back in Git 2.38, git cat-file --batch picked up a new -z flag, allowing you to specify NUL-delimited input instead of delimiting your input with a standard newline. This flag is useful when issuing queries which themselves contain newlines, like trying to read the contents of some blob by path, if the path contains newlines.

    But the new -z option only changed the rules for git cat-file‘s input, leaving the output still delimited by newlines. Ordinarily, this won’t cause any problems. But if git cat-file can’t locate an object, it will print out ” missing”, followed by a newline.

    If the given query itself contains a newline, the result is unparseable. To address this, git cat-file has a new mode, -Z (as opposed to its lowercase variant, -z) which changes both the input and output to be NUL-delimited.

    [source]

The rest of the iceberg

That’s just a sample of changes from the latest release. For more, check out the release notes for 2.42, or any previous version in the Git repository.

Notes


  1. Doing so would introduce a directory/file-conflict. Since Git stores loose tags at paths like $GIT_DIR/refs/tags/foo/bar, it would be impossible to store a tag foo, since it would need to live at $GIT_DIR/refs/tags/foo, which already exists as a directory. 

The post Highlights from Git 2.42 appeared first on The GitHub Blog.

Read the whole story
emrox
1 day ago
reply
Hamburg, Germany
Share this story
Delete

A (more) Modern CSS Reset

1 Share

I wrote A Modern CSS Reset almost 4 years ago and, yeh, it’s not aged overly well. I spotted it being linked up again a few days ago and thought it’s probably a good idea to publish an updated version.

I know I also have a terrible record with open source maintenance, so I thought I’d archive the original and just post this instead. Do with it what you want!

To be super clear, this is a reset that works for me, personally and us at Set Studio. Whenever I refer to “we”, that’s who I’m referring to.

The reset in whole

/* Box sizing rules */
*,
*::before,
*::after {
  box-sizing: border-box;
}
/* Prevent font size inflation */
html{
  -moz-text-size-adjust: none;
  -webkit-text-size-adjust: none;
  text-size-adjust: none;
}
/* Remove default margin in favour of better control in authored CSS */
body,
h1,
h2,
h3,
h4,
p,
figure,
blockquote,
dl,
dd {
  margin: 0;
}
/* Remove list styles on ul, ol elements with a list role, which suggests default styling will be removed */
ul[role='list'],
ol[role='list'] {
  list-style: none;
}
/* Set core body defaults */
body {
  min-height: 100vh;
  line-height: 1.5;
}
/* Set shorter line heights on headings and interactive elements */
h1,
h2,
h3,
h4,
button,
input,
label {
  line-height: 1.1;
}
/* Balance text wrapping on headings */
h1,
h2,
h3,
h4 {
  text-wrap: balance;
}
/* A elements that don't have a class get default styles */
a:not([class]) {
  text-decoration-skip-ink: auto;
  color: currentColor;
}
/* Make images easier to work with */
img,
picture {
  max-width: 100%;
  display: block;
}
/* Inherit fonts for inputs and buttons */
input,
button,
textarea,
select {
  font: inherit;
}
/* Make sure textareas without a rows attribute are not tiny */
textarea:not([rows]) {
  min-height: 10em;
}
/* Anything that has been anchored to should have extra scroll margin */
:target {
  scroll-margin-block: 5ex;
}

Breakdown

As is tradition, let’s break it down:

/* Box sizing rules */
*,
*::before,
*::after {
  box-sizing: border-box;
}

This rule is pretty self explanatory, but in short, I’m setting all elements and pseudo-elements to use the border-box, rather than the default content-box for sizing. Now we’re focussing more on letting the browser do more work with flexible layouts with fluid type and space, this rule isn’t as useful as it once was. But, it’s rare that a project doesn’t have some explicit sizing somewhere, so it’s still got a place in the reset.

/* Prevent font size inflation */
html{
  -moz-text-size-adjust: none;
  -webkit-text-size-adjust: none;
  text-size-adjust: none;
}

The best explainer for this is by Kilian. He also explains why we still need those ugly prefixes too.

/* Remove default margin in favour of better control in authored CSS */
body,
h1,
h2,
h3,
h4,
p,
figure,
blockquote,
dl,
dd {
  margin-block-end: 0;
}

I’ll always favour stripping out user agent styles for margin in favour of defining flow and space at a more macro level. With logical properties, I’m removing the end margin now instead of all sides in the older reset. It seems to work well in production.

/* Remove list styles on ul, ol elements with a list role, which suggests default styling will be removed */
ul[role='list'],
ol[role='list'] {
  list-style: none;
}

Safari do some wild shit, which includes this one: if you remove list styling, they’ll remove the semantics for VoiceOver. Some will say it’s a feature and some will say it’s a bug. I say it’s daft, but to make sure that a role is added, I remove the list styling by default for it as a little reward.

/* Set core body defaults */
body {
  min-height: 100vh;
  line-height: 1.5;
}

I like a nice legible line height that gets inherited. Setting a min height as 100vh on the body is pretty handy too, especially if you’re gonna be setting decorative elements. It might be tempting to use a new unit like dvh, but if you delve deep like Ahmad has, you’ll see that can cause more problems than solutions, which is not what you want your reset to be doing! Always make sure you understand the newer units before diving head first into them, or recommending them as gospel is my advice.

/* Set shorter line heights on headings and interactive elements */
h1,
h2,
h3,
h4,
button,
input,
label {
  line-height: 1.1;
}

Just like it’s handy to have a generous line-height globally, it’s equally handy to have a shorter line-height for headings and buttons etc. It’s definitely worth removing or modifying this rule if your font has large ascenders and descenders though. The last thing you want is those clashing with each other and creating an accessibility issue.

/* Balance text wrapping on headings */
h1,
h2,
h3,
h4 {
  text-wrap: balance;
}

This rule is rather specific to our projects, but the newish text-wrap property makes headings look lovely. I imagine some people will find this not appropriate, so you might want to remove this one.

/* A elements that don't have a class get default styles */
a:not([class]) {
  text-decoration-skip-ink: auto;
  color: currentColor;
}

This rule is first making sure the text decoration doesn’t interfere with ascenders and descenders. I think this is mostly default in browsers now, but it’s a good insurance policy to set it too. We like to set links to inherit the currentColor of text too by default at the studio, but if you don’t, you probably want to remove it.

/* Inherit fonts for inputs and buttons */
input,
button,
textarea,
select {
  font: inherit;
}

The font: inherit shorthand is bloody useful and we certainly find that with inputs and form elements. It’s mostly going to affect <textarea> elements, but there’s no harm in applying it to other form elements too because it saves on some CSS later on in a project.

/* Make sure textareas without a rows attribute are not tiny */
textarea:not([rows]) {
  min-height: 10em
}

Speaking of <textarea> elements, this rule is handy. By default, if you don’t add a rows attribute, they can be super teeny. This isn’t ideal with coarse pointers such as fingers and by proxy, <textarea> elements tend to be used for multiple lines of text. It makes sense to make that easier.

/* Anything that has been anchored to should have extra scroll margin */
:target {
  scroll-margin-block: 5ex;
}

Finally, if an element is anchored, it makes sense to add a bit more space above it with scroll-margin, which is only accounted for if that element is targeted. A little tweak with lots of user experience power! You might want to adjust this if you have a fixed header though.

Wrapping up

Will this reset last another 4 years? Maybe! The other one certainly hasn’t caused issues.

Regardless, it’s pretty useful as a base to start projects, here at the studio. We don’t go back and retroactively update the other reset on client projects, either because that one works absolutely fine.

Resets are one of those things that people get worked up about, but really, with browsers being so bloody good now, you probably don’t even need one in the first place. My advice is take bits you like from ones you find on the web and create your own, that works for you and your team.

Read the whole story
emrox
2 days ago
reply
Hamburg, Germany
Share this story
Delete

Keeping up with the times

1 Share

The post Keeping up with the times appeared first on Work Chronicles.

Read the whole story
emrox
7 days ago
reply
Hamburg, Germany
Share this story
Delete

Death by a thousand microservices

1 Share

The Church of Complexity

There is a pretty well-known sketch in which an engineer is explaining to the project manager how an overly complicated maze of microservices works in order to get a user’s birthday - and fails to do so anyway. The scene accurately describes the absurdity of the state of the current tech culture. We laugh, and yet bringing this up in a serious conversation is tantamount to professional heresy, rendering you borderline un-hirable.

How did we get here? How did our aim become not addressing the task at hand but instead setting a pile of cash on fire by solving problems we don’t have?

The perfect storm

There are a few things in recent history that may have contributed to the current state of things. First, a whole army of developers writing Javascript for the browser started self-identifying as “full-stack”, diving into server development and asynchronous code. Javascript is Javascript, right? What difference does it make what you create using it - user interfaces, servers, games, or embedded systems. Right? Node was still kind of a learning project of one person, and Javascript back then was a deeply problematic choice for server development. Pointing this out to still green server-side developers usually resulted in a lot of huffing and puffing. This is all they knew, after all. The world outside of Node effectively did not exist, the Node way was the only known way, and so this was the genesis of the stubborn, dogmatic thinking that we deal with to this day.

And then, a steady stream of FAANG veterans started merging into the river of startups, mentoring the newly-minted and highly impressionable young Javascript server-side engineers. The apostles of the Church of Complexity would assertively claim that “how they did things over at Google” was unquestionable and correct - even if it made no sense under current context and size. What do you mean you don’t have a separate User Preferences Service? That just will not scale, bro!

But, it’s easy to blame the veterans and the newcomers for all of this. What else was happening? Oh yeah - easy money.

What do you do when you are flush with venture capital? You don’t go for revenue, surely! On more than one occasion I received an email from management, asking everyone to be in the office, tidy up their desks and look busy, as a clouder of Patagonia vests was about to be paraded through the office. Investors needed to see explosive growth, but not in profitability, no. They just needed to see how quickly the company could hire ultra-expensive software engineers to do … something.

And now that you have these developers, what do you do with them? Well, they could build a simpler system that is easier to grow and maintain, or they could conjure up a monstrous constellation of “microservices” that no one really understands. Microservices - the new way of writing scalable software! Are we just going to pretend that the concept of “distributed systems” never existed? (Let’s skip the whole parsing of the nuances that microservices are not real distributed systems).

Back in the days when the tech industry was not such a bloated farce, distributed systems were respected, feared, and generally avoided - reserved only as the weapon of last resort for particularly gnarly problems. Everything with a distributed system becomes more challenging and time-consuming - development, debugging, deployment, testing, resilience. But I don’t know - maybe it’s all super easy now because toooollling.

There is no standard tooling for microservices-based development - there is no common framework. Working on distributed systems has gotten only marginally easier in 2020s. The Dockers and the Kuberneteses of the world did not magically take away the inherent complexity of a distributed setup.

I love referring to this summary of 5 years of startup audits, as it is packed with common-sense conclusions based on hard evidence (and paid insights):

… the startups we audited that are now doing the best usually had an almost brazenly ‘Keep It Simple’ approach to engineering. Cleverness for cleverness sake was abhorred. On the flip side, the companies where we were like ”woah, these folks are smart as hell” for the most part kind of faded.

Literally - “complexity kills”.

The audit revealed an interesting pattern, where many startups experienced a sort of collective imposter syndrome while building straight-forward, simple, performant systems. There is a dogma attached to not starting out with microservices on day one - no matter the problem. “Everyone is doing microservices, yet we have a single Django monolith maintained by just a few engineers, and a MySQL instance - what are we doing wrong?”. The answer is almost always “nothing”.

Likewise, it’s common for seasoned engineers to experience hesitation and inadequacy in today’s tech world, and the good news is that, no - it’s probably not you. It’s common for teams to pretend like they are doing “webs cale”, hiding behind libraries, ORMs, and cache - confident in their expertise (they crushed that Leetcode!), yet they may not even be aware of database indexing basics. You are operating in a sea of unjustified overconfidence, waste, and Dunning-Kruger, so who is really the imposter here?

There is nothing wrong with a monolith

The idea that you cannot grow without a system that looks like the infamous diagram of Afghanistan war strategy is largely a myth.

Dropbox, Twitter, Facebook, Instagram, Shopify, Stack Overflow - these companies and others started out as monolithic code bases. Many have a monolith at their core to this day. Stack Overflow makes it a point of pride how little hardware they need to run the massive site. Shopify is still a Rails monolith, leveraging the tried and true Resque to proces billions of tasks.

WhatsApp went supernova with their Erlang monolith and 50 engineers. How?

WhatsApp consciously keeps the engineering staff small to only about 50 engineers.

Individual engineering teams are also small, consisting of 1 - 3 engineers and teams are each given a great deal of autonomy.

In terms of servers, WhatsApp prefers to use a smaller number of servers and vertically scale each server to the highest extent possible.

Instagram was acquired for billions - with a crew of 12.

And do you imagine Threads as an effort involving a whole Meta campus? Nope. They followed the Instagram model, and this is the entire Threads team:

Perhaps claiming that your particular problem domain requires a massively complicated distributed system and an open office stuffed to the gills with turbo-geniuses is just crossing over into arrogance rather than brilliance?

Don’t solve problems you don’t have

It’s a simple question - what problem are you solving? Is it scale? How do you know how to break it all up for scale and performance? Do you have enough data to show what needs to be a separate service and why? Distributed systems are built for size and resilience. Can your system scale and be resilient at the same time? What happens if one of the services goes down or comes to a crawl? Just scale it up, yes? What about the other services that will get flooded with load? Did you war-game the endless permutations of things that can and will go wrong? Is there back pressure? Circuit breakers? Queues? Jitter? Sensible timeouts on every endpoint? Are there fool-proof guards to make sure a simple change does not bring everything down? The knobs you need to be aware of and tune are endless, and they are all specific to your system’s particular signature of usage and traffic.

The truth is that most companies will never reach the massive size that will actually require building a true distribute system. Your cos playing Amazon and Google - without their scale, expertise, and endless resources - is very likely just an egregious waste of money and time.

The only thing harder than a distributed system is a BAD distributed system.

“But each team…but separate… but API”

Trying to shove a distributed topology into your company’s structure is a noble effort, but it almost always backfires. It’s a common approach to break up a problem into smaller pieces and then solve those one by one. So, the thinking goes, if you break up one service into multiple ones, everything becomes easier, right?

The theory is sweet and elegant - each microservice is being maintained rigorously by a dedicated team, walled off behind a beautiful, backward-compatible, versioned API. In fact, this is all so steely that you rarely even have to communicate with that team - as if the microservice was maintained by a 3rd party vendor. It’s simple!

If that doesn’t sound familiar, that’s because this rarely happens. In reality, our Slack channels are flooded with messages from teams communicating about releases, bugs, configuration updates, breaking changes, and PSAs. Everyone needs to be on top of everything, all the time. And if that wasn’t great, it’s normal for one already-slammed team to half-ass multiple microservices instead of doing a great job on a single one, often changing ownership as people come and go.

In order to win the race, we don’t build one good race car - we build a fleet of shitty golf carts.

What you lose

There are multiple pitfalls to building with microservices, and often that minefield is either not fully appreciated or simply ignored. Teams spend months writing highly customized tooling and learning lessons not related at all to the core product. Here are just some often overlooked aspects…

Say goodbye to DRY

After decades of teaching developers to write Don’t Repeat Yourself code, it seems we just stopped talking about it altogether. Microservices by default are not DRY, with every service stuffed with redundant boilerplate. Very often the overhead of such “plumbing” is so heavy, and the size of the microservices is so small, that the average instance of a service has more “service” than “product”. So what about the common code that can be factored out?

  • Have a common library?
  • How does the common library get updated? Keep different versions everywhere?
  • Force updates regularly, creating dozens of pull requests across all repositories?
  • Keep it all in a monorepo? That comes with its own set of problems.
  • Allow for some code duplication?
  • Forget it, each team gets to reinvent the wheel every time.

Each company going this route faces these choices, and there are no good “ergonomic” options - you have to choose your version of the pain.

Developer ergonomics will crater

“Developer ergonomics” is the friction, the amount of effort a developer must go through in order to get something done, be it working on a new feature or resolving a bug.

With microservices, an engineer has to have a mental map of the entire system in order to know what services to bring up for any particular task, what teams to talk to, whom to talk to, and what about. The “you have to know everything before doing anything” principle. How do you keep on top of it? Spotify, a multi-billion dollar company, spent probably not negligible internal resources to build Backstage, software for cataloging its endless systems and services.

This should at least give you a clue that this game is not for everyone, and the price of the ride is high. So what about the tooooling? The Not Spotifies of the world are left with McGivering their own solutions, robustness and portability of which you can probably guess.

And how many teams actually streamline the process of starting a YASS - “yet another stupid service”? This includes:

  • Developer privileges in GitHub/GitLab
  • Default environment variables and configuration
  • CI/CD
  • Code quality checkers
  • Code review settings
  • Branch rules and protections
  • Monitoring and observability
  • Test harness
  • Infrastructure-as-code

And of course, multiply this list by the number of programming languages used throughout the company. Maybe you have a usable template or a runbook? Maybe a frictionless, one-click system to launch a new service from scratch? It takes months to iron out all the kinks with this kind of automation. So, you can either work on your product, or you can be working on toooooling.

Integration tests - LOL

As if the everyday microservices grind was not enough, you also forfeit the peace of mind offered by solid integration tests. Your single-service and unit tests are passing, but are your critical paths still intact after each commit? Who is in charge of the overall integration test suite, in Postman or wherever else? Is there one?

Integration testing a distributed setup is a nearly-impossible problem, so we pretty much gave up on that and replaced it with another one - Observability. Just like “microservices” are the new “distributed systems”, “observability” is the new “debugging in production”. Surely, you are not writing real software if you are not doing…. observability!

Observability has become its own sector, and you will pay in both pretty penny and in developer time for it. It doesn’t come as plug-and-pay either - you need to understand and implement canary releases, feature flags, etc. Who is doing that? One already overwhelmed engineer?

As you can see, breaking up your problem does not make solving it easier - all you get is another set of even harder problems.

What about just “services”?

Why do your services need to be “micro”? What’s wrong with just services? Some startups have gone as far as create a service for each function, and yes, “isn’t that just like Lambda” is a valid question. This gives you an idea of how far gone this unchecked cargo cult is.

So what do we do? Starting with a monolith is one obvious choice. A pattern that could also work in many instances is “trunk & branches”, where the main “meat and potatoes” monolith is helped by “branch” services. A branch service can be a service that takes care of a clearly-identifiable and separately-scalable load. A CPU-hungry Image-Resizing Service makes way more sense than a User Registration Service. Or do you get so many registrations per second that it requires independent horizontal scaling?

The pendulum is swinging back

The hype, however, seems to be dying down. The VC cash faucet is tightening, and so the businesses have been market-corrected into exercising common-sense decisions, recognizing that perhaps splurging on web-scale architectures when they don’t have web-scale problems is not sustainable.

Ultimately, when faced with the need to travel from New York to Philadelphia, you have two options. You can either attempt to construct a highly intricate spaceship for an orbital descent to your destination, or you can simply purchase an Amtrak train ticket for a 90-minute ride. That is the problem at hand.

Read the whole story
emrox
9 days ago
reply
Hamburg, Germany
Share this story
Delete

Cool Tools

1 Share

Chris Brandrick at Frontend Focus asked me to share five tools, libraries or packages that I love for the latest issue of the newsletter. It was hard to narrow it down to just five! So I thought I’d share the full long-list with you here, in no particular order.

Parcel

I hate having to spend hours configuring a new project when I just want to jump straight into some code, which is why Parcel is my go-to bundler. It’s “zero config”, and makes it quick and easy to set up projects.

WebPageTest: Carbon Control

WebPageTest measures your site’s performance, and has recently added a new feature, Carbon Control, which helps to measure the CO2 impact, along with tips for reducing it.

Utopia

I somehow missed this one off the original list, despite using it for nearly every project. Utopia (by James Gilyead and Trys Mudford) provides configurable set of CSS custom properties for fluid sizing and typography, with options to use a modular scale. It’s a godsend, and helps to avoid writing media queries all over the place!

Lite YouTube Embed

Embedded videos can be responsible for performance problems (and CO2 emissions) of a site. This package by Paul Irish ensures they are only loaded upon interaction, providing a better (faster) user experience.

SVG OMG

Jake Archibald’s tool for minifying and compressing SVGs is indispensable for any developer working with SVG.

SVG Path Visualizer

A really cool playground to help you understand how SVG path syntax works.

URL Encoder for SVG

This simple tool takes an SVG and converts it to a base64 encodes URL, for use in CSS properties such as background-image and mask-image.

Firefox Dev Tools

Firefox is my primary development browser and has a bunch of handy inspection tools for working with CSS, including a color contrast accessibility checker, inspectors for fonts, layout and animation, a changes panel, validity and compatibility checkers. It was too hard to pick just one favourite feature!

Cubic Bezier

Lea Verou’s visualizer for cubic bezier easing curves for CSS animations is one I use regularly for crafting more natural-looking animations.

ENV Switcher for VS Code

This VS Code extension enables easy switching between different .env files in a project.

Are my third parties green?

A tool by Fershad Irani for checking if your third-party scripts use green hosting.

Tiny PNG

Build tools can be configured to compress images, but we can reduce energy use even more by not uploading large images in the first place! Tiny PNG squooshes not just PNG files, but JPG and WebP too, with no visible loss of quality.

Import Cost for VS Code

An extension for VS Code that shows the size of your JS imports in a file. It helps you become aware of the worst offenders, and consider more lightweight alternatives.

D3

If you work with data visualisation on the web you’ll likely already know D3.js. But I wanted to include it here as it’s a great library, and one that I use almost every day of my working life at Ada Mode. In addition to data viz, it has useful modules for working with arrays, date and time formatting, SVG in general, and much more.

I’m sure I’m missing a bunch! What are your favourite tools? Let me know!

Read the whole story
emrox
9 days ago
reply
Hamburg, Germany
Share this story
Delete
Next Page of Stories