3117 stories
·
1 follower

Stable Diffusion 2.0 Release

1 Share

Just like the first iteration of Stable Diffusion, we’ve worked hard to optimize the model to run on a single GPU–we wanted to make it accessible to as many people as possible from the very start. We’ve already seen that, when millions of people get their hands on these models, they collectively create some truly amazing things. This is the power of open source: tapping the vast potential of millions of talented people who might not have the resources to train a state-of-the-art model, but who have the ability to do something incredible with one.

This new release, along with its powerful new features like depth2img and higher resolution upscaling capabilities, will serve as the foundation of countless applications and enable an explosion of new creative potential.

Read the whole story
emrox
3 days ago
reply
Hamburg, Germany
Share this story
Delete

Moving away from UUIDs

1 Comment

If you need an unguessable random string (for a session cookie or access token, for example), it can be tempting to reach for a random UUID, which looks like this:

 88cf3e49-e28e-4c0e-b95f-6a68a785a89d

This is a 128-bit value formatted as 36 hexadecimal digits separated by hyphens. In Java and most other programming languages, these are very simple to generate:


import java.util.UUID;


String id = UUID.randomUUID().toString();

Under the hood this uses a cryptographically secure pseudorandom number generator (CSPRNG), so the IDs generated are pretty unique. However, there are some downsides to using random UUIDs that make them less useful than they first appear. In this note I will describe them and what I suggest using instead.

How random is a UUID anyway?

As stated on the Wikipedia entry, of the 128 bits in a random UUID, 6 are fixed variant and version bits leaving 122 bits of actual random. 122 bits is still quite a lot of random, but is it actually enough? If you are generating OAuth 2 access tokens, then the spec says no:

The probability of an attacker guessing generated tokens (and other
credentials not intended for handling by end-users) MUST be less than
or equal to 2^(-128) and SHOULD be less than or equal to 2^(-160).

Well, even if the attacker only makes a single guess, the probability of guessing a 122-bit random value can never be less than 2-122, so strictly speaking a random UUID is in violation of the spec. But does it really matter?

To work out how long it would take an attacker to guess a valid token/cookie, we need to consider a number of factors:

  • How quickly can the attacker make guesses? An attacker that can try millions of candidate tokens per second can find a match much faster than somebody who can only try a hundred. We will call this rate (in guesses per second) A.
  • How many bits of randomness are in each token? A 128-bit random token is more difficult to guess than a 64-bit token. We will label the number of random bits B.
  • How many tokens are valid at any given time? If you have issued a million active session cookies then an attacker can try and guess any of them, making their job easier than if there was just one. Such batch attacks are often overlooked. We will label the number of valid tokens in circulation at any one time S.

Given these parameters, OWASP give a formula for calculating how long it will take an attacker to guess a random token as:

(2^B+1)/(2AS)

Let’s plug in some numbers and see what we get. But what are reasonable numbers? Well, for security we usually want to push the numbers well beyond what we think is actually possible to be really sure. So what is actually possible now?

When we consider how fast a well resourced attacker could guess a token, we can use the Bitcoin hash rate as a reasonable upper-bound approximation. A lot of people are investing a lot of time and money into generating random hashes, and we can view this as roughly equivalent to our attacker’s task. When I looked back in February (you can see how long my blog queue is!), the maximum rate was around 24293141000000000000 hashes per second, or around 264.

That’s a pretty extreme number. It’s fairly unlikely that anyone would direct that amount of resource at breaking your site’s session tokens, and you can use rate limiting and other tactics. But it is worth considering the extremes. After all, this is clearly possible with current technology and will only improve over time.

How many tokens might we have in circulation at any time? Again, it is helpful to consider extremes. Let’s say your widely successful service issues access tokens to every IoT (Internet of Things) device on the planet, at a rate of 1 million tokens per second. As you have a deep instinctive trust in the security of these devices (what could go wrong?), you give each token a 2-year validity period. At a peak you will then have around 63 trillion (246) tokens in circulation.

If we plug these figures into the equation from before, we can see how long our 122-bit UUID will hold out:

A = 264
B = 122
S = 246

That comes out as … 2048 seconds. Or a bit less than 35 minutes. Oh.

OK, so those extreme numbers look pretty terrifying, but they are also quite extreme. The Bitcoin community spend enormous sums of money (certainly in the tens of millions of dollars) annually to produce that kind of output. Also, testing each guess most likely requires actually making a request to one of your servers, so you are quite likely to notice that level of traffic – say by your servers melting a hole through the bottom of your datacentre. If you think you are likely to attract this kind of attention then you might want to carefully consider which side of the Mossad/not-Mossad threat divide you live on and maybe check your phone isn’t a piece of Uranium.

All this is to say that if you have deployed random UUIDs in production, don’t panic! While I would recommend that you move to something better (see below) at some point, plugging more likely numbers into the equation should reassure you that you are unlikely to be at risk immediately. An attacker would still have to invest considerable time and money into launching such an attack.

Other nits

The borderline acceptable level of entropy in a random UUID is my main concern with them, but there are others too. In the standard string form, they are quite inefficient. The dash-separated hexadecimal format takes 36 characters to represent 16 bytes of data. That’s a 125% expansion, which is pretty terrible. Base64-encoding would instead use just 24 characters, and just 22 if we remove the padding, resulting in just 37.5% expansion.

Finally, a specific criticism of Java’s random UUID implementation is that internally it uses a single shared SecureRandom instance to generate the random bits. Depending on the backend configured, this can acquire a lock which can become heavily contended if you are generating large numbers of random tokens, especially if you are using the system blocking entropy source (don’t do that, use /dev/urandom). By rolling your own token generation you can use a thread-local or pool of SecureRandom instances to avoid such contention. (NB – the NativePRNG uses a shared static lock internally, so this doesn’t help in that case, but it also holds the lock for shorter critical sections so is less prone to the problem).

What should we use instead?

My recommendation is to use a 160-bit (20 byte) random value that is then URL-safe base64-encoded. The URL-safe base64 variant can be used pretty much anywhere, and is reasonably compact. In Java:

import java.security.SecureRandom;
import java.util.Base64;

public class SecureRandomString {
    private static final SecureRandom random = new SecureRandom();
    private static final Base64.Encoder encoder = Base64.getUrlEncoder().withoutPadding();

    public static String generate() {
        byte[] buffer = new byte[20];
        random.nextBytes(buffer);
        return encoder.encodeToString(buffer);
    }
}

This produces output values like the following:

Xl3S2itovd5CDS7cKSNvml4_ODA

This is both shorter than a UUID and also more secure having 160 bits of entropy. I can also make the SecureRandom into a ThreadLocal if I want.

So how long would it take our extreme attacker to find a 160-bit random token? Around 17.9 million years. By tweaking the format of our tokens just a little we can move from worrying about attacker capabilities and resources to inner peace and happiness. It is so far beyond the realm of possible that we can stop worrying.

Why not go further? Why not 256 bits? That would push the attack costs into even more absurd territory. I find that 160 bits is a sweet spot of excellent security while still having a compact string representation.

Like this:

Like Loading...

Author: Neil Madden

Security Architect at ForgeRock. Experienced software engineer with a PhD in computer science. Interested in application security, applied cryptography, logic programming and intelligent agents. View all posts by Neil Madden

Read the whole story
emrox
6 days ago
reply
prob. UUIDs are still good for most of the purposes where you need a uniq non-ID-based system.
Hamburg, Germany
Share this story
Delete

Is Wine Fake?

1 Share

Your classiest friend invites you to dinner. They take out a bottle of Chardonnay that costs more than your last vacation and pour each of you a drink. They sip from their glass. “Ah,” they say. “1973. An excellent vintage. Notes of avocado, gingko and strontium.” You’re not sure what to do. You mumble something about how you can really taste the strontium. But internally, you wonder: Is wine fake?

A vocal group of skeptics thinks it might be. The most eloquent summary of their position is The Guardian’sWine-Tasting: It’s Junk Science,” which highlights several concerning experiments:

In 2001 Frédérick Brochet of the University of Bordeaux asked 54 wine experts to test two glasses of wine – one red, one white. Using the typical language of tasters, the panel described the red as “jammy" and commented on its crushed red fruit.

The critics failed to spot that both wines were from the same bottle. The only difference was that one had been coloured red with a flavourless dye.

In 2011 Professor Richard Wiseman, a psychologist (and former professional magician) at Hertfordshire University invited 578 people to comment on a range of red and white wines, varying from £3.49 for a claret to £30 for champagne, and tasted blind. People could tell the difference between wines under £5 and those above £10 only 53% of the time for whites and only 47% of the time for reds. Overall they would have been just as successful flipping a coin to guess.

Some blinded trials among wine consumers have indicated that people can find nothing in a wine’s aroma or taste to distinguish between ordinary and pricey brands. Academic research on blinded wine tastings have also cast doubt on the ability of professional tasters to judge wines consistently.

But I recently watched the documentary Somm, about expert wine-tasters trying to pass the Master Sommelier examination. As part of their test, they have to blind-taste six wines and, for each, identify the grape variety, the year it was produced, and tasting notes (e.g., “aged orange peel” or “hints of berry”). Then they need to identify where the wine was grown: certainly in broad categories like country or region, but ideally down to the particular vineyard. Most candidates — 92% — fail the examination. But some pass. And the criteria are so strict that random guessing alone can’t explain the few successes.

So what’s going on? How come some experts can’t distinguish red and white wines, and others can tell that it’s a 1951 Riesling from the Seine River Valley? If you can detect aged orange peel, why can’t you tell a $3 bottle from a $30 one?

In Vino Veritas

All of those things in Somm — grape varieties, country of origin and so on — probably aren’t fake.

The most convincing evidence for this is “Supertasters Among the Dreaming Spires,” from 1843 magazine (also summarized in The Economist). Here a journalist follows the Oxford and Cambridge competitive wine-tasting teams as they prepare for their annual competition. The Master Sommelier examination has never made its results public to journalists or scientists — but the Oxbridge contest did, confirming that some of these wine tasters are pretty good.

Top scorers were able to identify grape varieties and countries for four of the six wines. In general, tasters who did well on the reds also did well on the whites, suggesting a consistent talent. And most tasters failed on the same wines (e.g., the Grenache and Friulano), suggesting those were genuinely harder than others.

If the Oxbridge results are true, how come Brochet’s experts couldn’t distinguish red and white wine? A closer look at the original study suggests three possible problems.

First, the experts weren’t exactly experts. They were, in the grand tradition of studies everywhere, undergraduates at the researchers’ university. Their only claim to expertise was their course of study in enology, apparently something you can specialize in if you go to the University of Bordeaux. Still, the study doesn’t say how many years they’d been studying, or whether their studies necessarily involved wine appreciation as opposed to just how to grow grapes or run a restaurant.

Second, the subjects were never asked whether the wine was red or white. They were given a list of descriptors, some of which were typical of red wine, others of white wine, and asked to assign them to one of the wines. (They also had the option to pick descriptors of their own choosing, but it’s not clear if any did.) Maybe their thought process was something like “neither of these tastes red, exactly, but I’ve got to assign the red wine descriptors to one of them, and the one on the right is obviously a red wine because it’s red colored, so I’ll assign it to that one.”

Third, even if you find neither of these exculpatory, tricking people just works really well in general. Based on the theory of predictive coding, our brains first figure out what sensory stimuli should be, then see if there’s any way they can shoehorn actual stimuli to the the expected pattern. If they can’t, then the brain will just register the the real sensation, but as long as it’s pretty close they’ll just return the the prediction. For example, did you notice that the word “the” was duplicated three times in this paragraph? Your brain was expecting to read a single word “the,” just as it always has before, and when you’re reading quickly, the mild deviation from expected stimuli wasn’t enough to raise any alarms.

Or consider the famous Pepsi Challenge: Pepsi asked consumers to blind-taste-test Pepsi vs. Coke; most preferred Pepsi. But Coke maintains its high market share partly because when people are asked to nonblindly taste Coke and Pepsi (as they always do in the real world) people prefer Coke. Think of it as the brain combining two sources of input to make a final taste perception: the actual taste of the two sodas and a preconceived notion (probably based on great marketing) that Coke should taste better. In the same way, wine tasters given some decoy evidence (the color of the wine) combine that evidence with the real taste sensations in order to produce a conscious perception of what the wine tastes like. That doesn’t necessarily mean the same tasters would get it wrong if they weren’t being tricked.

Pineau et al. conducted a taste test that removed some of these issues; they asked students to rank the berry tastes (a typical red wine flavor) of various wines while blinded to (but not deceived about) whether they were red or white. They were able to do much better than chance (p<0.001).

The Price Is Wrong

Just because wine experts can judge the characteristics of wine doesn’t mean we should care about their assessments of quality. Most of the research I found showed no blind preference for more expensive wines over cheaper ones.

Here my favorite study is Goldstein et al., “Do More Expensive Wines Taste Better? Evidence From a Large Sample of Blind Tastings.” They look at 6,175 tastings from 17 wine tasting events and find that, among ordinary people (nonexperts), “the correlation between price and overall rating is small and negative, suggesting that individuals on average enjoy more expensive wines slightly less.” But experts might prefer more expensive wine; the study found that if wine A cost 10 times more than wine B, experts on average ranked it seven points higher on a 100-point scale. However, this effect was not quite statistically significant, and all that the authors can say with certainty is that experts don’t dislike more expensive wine the same way normal people do.

Harrar et al. have a study in Flavour, which was somehow a real journal until 2017, investigating novice and expert ratings of seven sparkling wines. Somewhat contrary to the point I made above, everyone (including experts) did poorly in identifying which wines were made of mostly red vs. white grapes (although most of the wines were mixed, which might make it a harder problem than just distinguishing pure reds from pure whites). More relevant to the current question, they didn’t consistently prefer the most expensive champagne (£400) to the least expensive (£18).

Robert Hodgson takes a slightly different approach and studies consistency among judges at wine competitions. If wine quality is real and identifiable, experts should be able to reliably judge identical samples of wine as identically good. In a series of studies, he shows they are okay at this. During competitions where wines are typically judged at between 80 and 100 points, blinded judges given the same wine twice rated on average about four points apart — in the language of wine tasting, the difference between “Silver−” and “Silver+”. Only 10% of judges were “consistently consistent” within a medal range, i.e., they never (in four tries) gave a wine “Silver” on one tasting and “Bronze” or “Gold” the next. Another 10% of judges were extremely inconsistent, giving wine Gold during one tasting and Bronze (or worse) during another. Most of the time, they were just a bit off. Judges were most consistent at the bottom of the range — they always agreed terrible wines were terrible — and least consistent near the top.

In another study, Hodgson looks at wines entered in at least three competitions. Of those that won Gold in one, 84% received no award (i.e., neither Gold, Silver, nor Bronze) in at least one other. “Thus, many wines that are viewed as extraordinarily good at some competitions are viewed as below average at others.”

And here, too, a little bit of trickery can overwhelm whatever real stimuli people are getting. Lewis et al. put wine in relabelled bottles, so that drinkers think a cheap wine is expensive or vice versa. They find that even people who had completed a course on wine tasting (so not quite “experts,” but not exactly ordinary people either) gave judgments corresponding to the price and prestige of the labeled wine, not to the real wine inside the bottles.

So experienced tasters generally can’t agree on which wines are better than others, or identify pricier wines as tasting better. Does this mean that wine is fake? Consider some taste we all understand very well, like pizza — not even fancy European pizza, just normal pizza that normal people like. I prefer Detroit pizza, tolerate New York pizza, and can’t stand Chicago pizza. Your tastes might be the opposite. Does this mean there’s no real difference between pizza types? Or that one of us is lying, or faking our love of pizza, or otherwise culpable?

I’ll make one more confession — sometimes I prefer pizza from the greasy pizza joint down the street to pizza with exotic cheeses from a fancy Italian restaurant that costs twice as much. Does this mean the fancy Italian restaurant is a fraud? Or that the exotic cheeses don’t really taste different from regular cheddar and mozzarella?

There can be objectively bad pizza — burnt, cold, mushy — but there isn’t really any objective best pizza. Fancier and more complicated pizzas can be more expensive, not because they’re better, but because they’re more interesting. Maybe wine is the same way.

Notes on Notes

What about the tasting notes — the part where experts say a wine tastes like aged orange peel or avocado or whatever?

There aren’t many studies that investigate this claim directly. But their claims make sense on a chemical level. Fermentation produces hundreds of different compounds, many are volatile (i.e., evaporate easily and can be smelled), and we naturally round chemicals off to other plants or foods that contain them.

When people say a wine has citrus notes, that might mean it has 9-carbon alcohols somewhere in its chemical soup. If they say chocolate, 5-carbon aldehydes; if mint, 5-carbon ketones.

(Do wines ever have 6-carbon carboxylic acids, or 10-carbon alkanes — i.e., goats, armpits or jet fuel? I am not a wine chemist and cannot answer this question. But one of the experts interviewed on Somm mentioned that a common tasting note is cat urine, but that in polite company you’re supposed to refer to it by the code phrase “blackcurrant bud.” Maybe one of those things wine experts say is code for “smells like a goat,” I don’t know.)

Scientists use gas chromatography to investigate these compounds in wine and sometimes understand them on quite a deep level. For example, from “Grape-Derived Fruity Volatile Thiols: Adjusting Sauvignon Blanc Aroma and Flavor Complexity”:

Three main volatile thiols are responsible for the tropical fruit nuances in wines. They are 3MH (3-mercaptohexan-1-ol), 3MHA (3-mercaptohexyl acetate) and 4MMP (4-mercapto-4-methylpentan-2-one). The smell is quite potent (or “punchy,” as the Kiwis say) at higher concentrations, and descriptors used include tropical fruit, passionfruit, grapefruit, guava, gooseberry, box tree, tomato leaf and black currant. Perception thresholds for 4MMP, 3MH and 3MHA in model wine are 0.8 ng/L, 60 ng/L and 4.2 ng/L, respectively.

These numbers don’t necessarily carry over to wines, where aromas exactly at the perception threshold might be overwhelmed by other flavors, but since some wines can have thousands or tens of thousands of nanograms per liter of these chemicals, it makes sense that some people can detect them. A few studies are able to observe this detection empirically. Prida and Chatonnet found that experts rated wines with more furanic acid compounds as smelling oakier. And Tesfaye et al. find good inter-rater reliability in expert tasting notes of wine vinegars.

Weil, writing in the Journal of Wine Economics (another real journal!) finds that ordinary people can’t match wines to descriptions of their tasting notes at a better-than-chance level. I think the best explanation of this discrepancy is that experts can consistently detect these notes, but ordinary people can’t.

The Judgment of Paris

Until the 1970s, everyone knew French wines were the best in the world. Wine seller Steven Spurrier challenged the top French experts to a blind taste test of French vs. Californian wines. According to CNN:

The finest French wines were up against upstarts from California. At the time, this didn’t even seem like a fair contest — France made the world’s best wines and Napa Valley was not yet on the map — so the result was believed to be obvious.

Instead, the greatest underdog tale in wine history was about to unfold. Californian wines scored big with the judges and won in both the red and white categories, beating legendary chateaux and domaines from Bordeaux and Burgundy.

The only journalist in attendance, George M. Taber of Time magazine, later wrote in his article that “the unthinkable happened,” and in an allusion to Greek mythology called the event “The Judgment of Paris,” and thus it would forever be known.

“The unthinkable” is, if anything, underselling it. One judge, horrified, demanded her scorecard back. The tasting turned California’s Napa Valley from a nowhere backwater into one of the world’s top wine regions.

I bring this up because, well, the deliberately provocative title of this article was “Is Wine Fake?” Obviously wine is not fake: There is certainly a real drink made from fermented grapes. The real question at issue is whether wine expertise is fake. And that ties this question in with the general debate on the nature of expertise. There are many people who think many kinds of expertise are fake, and many other people pushing back against them; maybe wine is just one more front in this grander war.

And it would seem that wine expertise is real. With enough training (Master Sommelier candidates typically need 10 years of experience) people really can learn to identify wines by taste. Although ordinary people do not prefer more expensive to less expensive wine, some experts do, at least if we are willing to bend the statistical significance rules a little. And although ordinary people cannot agree on tasting notes, experts often can.

But although wine experts really do know more than you and I, the world of wine is insane. People spend thousands of dollars for fancy wine that they enjoy no more than $10 plonk from the corner store. Vintners obsess over wine contests that are probably mostly chance. False beliefs, like the superiority of French wine, get enshrined as unquestioned truths.

All the oenophiles and expert tasters of the 1960s and ’70s got one of the most basic questions in their field wrong. Why? Maybe patriotism: Most of the wine industry was in France, and they didn’t want to consider that other countries might be as good as they were. Maybe conformity: If nobody else was taking Californian wines seriously, why should you? Or maybe a self-perpetuating cycle, where if any expert had made a deep study of Californian wines, they would have been able to realize they were very good, but nobody thought such a study was worth it.

Wine is not fake. Wine experts aren’t fake either, but they believe some strange things, are far from infallible, and need challenges and blinded trials to be kept honest. How far beyond wine you want to apply this is left as an exercise for the reader.

Read the whole story
emrox
6 days ago
reply
Hamburg, Germany
Share this story
Delete

Style a parent element based on its number of children using CSS :has()

1 Share

In CSS it’s possible to style elements based on the number of siblings they have by leveraging the nth-child selector.

But what if you want to style the parent element based on the number of children? Well, that where the CSS :has() selector comes into play.

~

# The code

If you’re just here for the code, here it is. You can also see it in action in the demo below.

/* At most 3 (3 or less, excluding 0) children */
ul:has(> :nth-child(-n+3):last-child) {
	outline: 1px solid red;
}

/* At most 3 (3 or less, including 0) children */
ul:not(:has(> :nth-child(3))) {
	outline: 1px solid red;
}

/* Exactly 5 children */
ul:has(> :nth-child(5):last-child) {
	outline: 1px solid blue;
}

/* At least 10 (10 or more) children */
ul:has(> :nth-child(10)) {
	outline: 1px solid green;
}

/* Between 7 and 9 children (boundaries inclusive) */
ul:has(> :nth-child(7)):has(> :nth-child(-n+9):last-child) {
	outline: 1px solid yellow;
}

If you want to know how it works, keep on reading 🙂

~

# The selectors

The pattern of each selector built here is this:

parent:has(> count-condition) {
	…
}
  • By using parent:has() we can select the parent element that meets a certain condition for its children.
  • By passing > into :has(), we target the parent’s direct children.
  • The count-condition is something we need to come up with for each type of selection we want to do. Similar to Quantity Queries, we’ll leverage :nth-child() for this.

☝️ Although the :has() selector is often called the parent selector, it is way more than that.

At most x children

By using :nth-child with a negative -n it’s possible to select the first x children. If one of those is also the very last child, you can detect if a parent has up to x children

/* At most 3 (3 or less, excluding 0) children */
ul:has(> :nth-child(-n+3):last-child) {
	outline: 1px solid red;
}

This selector excludes parents that have no children. This is fine in most cases – as any element that only contains text would be matched – but if you do want to include use this selector instead:

/* At most 3 (3 or less, including 0) children */
ul:not(:has(> :nth-child(3))) {
	outline: 1px solid red;
}

Exactly x children

To select item x from a set you can use :nth-child without any n indication. If that child is also the last child, you know there’s exactly x children in the parent

/* Exactly 5 children */
ul:has(> :nth-child(5):last-child) {
	outline: 1px solid blue;
}

At least x children

To select a parent with a least x children, being able to select child x in it is enough to determine that. No need for :last-child here.

/* At least 10 (10 or more) children */
ul:has(> :nth-child(10)) {
	outline: 1px solid green;
}

Between x and y children

To do a between selection, you can combine two :has() conditions together. The first one selects all elements that have x or more children, whereas the second one cuts off the range by only allowing elements that have no more than y children. Only elements that match both conditions will be mathed:

/* Between 7 and 9 children (boundaries inclusive) */
ul:has(> :nth-child(7)):has(> :nth-child(-n+9):last-child) {
	outline: 1px solid yellow;
}

~

# Demo

See the Pen Styling parent elements based on the number of children with CSS :has() by Bramus (@bramus) on CodePen.

~

# Browser Support

These selectors are supported by all browsers that have :has() support. At the time of writing this does not include Firefox.

Flipping on the experimental :has() support in Firefox doesn’t do the trick either. Its implementation is still experimental as it doesn’t support all types of selection yet. Relative Selector Parsing (i.e. a:has(> b)) is one of those features that’s not supported yet – Tracking bug: #1774588

~

# Spread the word

To help spread the contents of this post, feel free to retweet its announcement tweet:

~

🔥 Like what you see? Want to stay in the loop? Here's how:

Read the whole story
emrox
7 days ago
reply
Hamburg, Germany
Share this story
Delete

Y2K and 2038

2 Comments and 4 Shares
It's taken me 20 years, but I've finally finished rebuilding all my software to use 33-bit signed ints.
Read the whole story
emrox
13 days ago
reply
Hamburg, Germany
Share this story
Delete
2 public comments
acdha
16 days ago
reply
Switching to 32-bit floats so we never need to worry about overflows again
Washington, DC
jlvanderzwan
14 days ago
16 posits or bust
alt_text_bot
16 days ago
reply
It's taken me 20 years, but I've finally finished rebuilding all my software to use 33-bit signed ints.

The Age of PageRank is Over

1 Share

When Sergey Brin and Larry Page came up with the concept of PageRank in their seminal paper The Anatomy of a Large-Scale Hypertextual Web Search Engine (Sergey Brin and Lawrence Page, Stanford University, 1998) they profoundly changed the way we utilize the web. For the next 25 years, humanity counted on their algorithm to deliver relevant results for its searches.

PageRank generated its results based on the idea that websites that link to other sites would be most valuable if they were based on merit rather than commercial motivation. The web was still young, conceived to be a force for good, sharing, personal expression, and unifying the world. The algorithm was a huge success. Inspired by how citations were used to “rank” academic papers, pages with links from a more significant number of other pages got a better “page rank,” which led to a fast and efficient way to produce the most relevant results for any query.

That was a fantastic breakthrough, but something started happening over the years. As some websites became more prominent thanks to their page rank (which was well deserved!), their publishers also realized they could monetize the traffic they started receiving. At the same time, search engines also discovered that ads are very lucrative.

This quickly led to ads becoming the dominant business model of the web. And the proliferation of ads brought another thing with it – a conflict of interest. Whether it is an ad-supported search engine or an ad-supported website, their users and customers suddenly have two different interests. Their user usually just wants to browse or search the web, while their customers try to sell things to that user.

Over the years, the web deteriorated to the state it is in now - a highly destructive force. Much of the damage is driven by the monetization of users and every aspect of their lives. Enterprises capture our preferences, our friends, our families, the information we consume, and the information we create. They manage and maximize for their benefit our preferences, our opinions, our purchases, and our relationships. The web can poison individual opinions, freedoms, and political and social institutions. It steals from us, addicts us, and harms us in many ways.

The websites driven by this business model became advertising and tracking-infested giants that will do whatever it takes to “engage” and monetize unsuspecting visitors. This includes algorithmic feeds, low-quality clickbait articles (which also contributed to the deterioration of journalism globally), stuffing the pages with as many ads and affiliate links as possible (to the detriment of the user experience and their own credibility), playing ads in videos every 45 seconds (to the detriment of generations of kids growing up watching these) and mining as much user data as possible. Ads became a global “tax” on using the web, paid mostly by non-tech-savvy users.

And the quality of search started to deteriorate.

The sad truth is that it was all predictable. In the same 1998 white paper, Mr. Brin and Mr. Page sharply criticized the ad-supported business model that other search engines used at the time (Appendix A: Advertising and Mixed Motives; emphasis mine):

“Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users. For example, in our prototype search engine one of the top results for cellular phone is “The Effect of Cellular Phone Use Upon Driver Attention”, a study which explains in great detail the distractions and risk associated with conversing on a cell phone while driving. This search result came up first because of its high importance as judged by the PageRank algorithm, an approximation of citation importance on the web [Page, 98].

It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media [Bagdikian, 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers. … Furthermore, advertising income often provides an incentive to provide poor quality search results. … In general, it could be argued from the consumer point of view that the better the search engine is, the fewer advertisements will be needed for the consumer to find what they want. This of course erodes the advertising supported business model of the existing search engines”

Yet, despite being acutely aware of the dangers of ad-supported search, selling ads was adopted as the primary business model of the new search venture just a few years later.

And the consequence - the potential of the greatest search technology the world ever saw and some of the most brilliant people in the world became limited by the business model with an inherent conflict of interest built into it. The web changed, driven by the same relentless ad-supported monetization. The very algorithm - PageRank - broke because nobody links to or curates content anymore. If they do, it is mainly for commercial benefit, not based on merit, which was the essence of both the original web and the algorithm.

This led to the concentration of power, with the same 100 or so largest websites showing nowadays in almost all searches by mainstream search engines. It further exacerbates the problem as smaller sites and amateur blogs do not surface in search results for people to discover and link to. The primary purpose of the web today is now “engagement” - or to translate from product management speak - “How many ads we can push down users’ throats.”

Author and political scientist Ian Bremmer remarked, “The idea that we get our information as citizens through algorithms determined by the world’s largest advertising company is my definition of dystopia.”

The age of PageRank as the model for finding the best pages on the web is over, with the algorithm ending up being polluted and entirely dominated by ads.

Nowadays when a user uses an ad-supported search engine, they are bound to encounter noise, wrong and misleading websites in the search results, inevitably insulting their intelligence and wasting their brain cycles. The algorithms themselves are constantly leading an internal battle between optimizing for ad revenue and optimizing for what the user wants. In most cases the former wins. Users are given results that keep them returning and searching for more instead of letting them go about their business as soon as possible.

This process produces self-enforcing monopolies in almost every sphere of online life - search, news, entertainment, social media… All these monopolies have two things in common:

  • They are a product of advertising-based business models;
  • They are unhealthy for our digital society and an antithesis to what the internet was supposed to be - fun, quirky, and exciting. Instead, they attempt to control almost every aspect of our online life and culture.

And this is why we built Kagi. We felt a strong need to stop this madness and reverse the direction the web is heading in. The main reason Kagi exists is to offer a radically different view of the web, one close to its original intention and one in which the users and their needs are in the center of the universe.

Not only are we living increasingly busier lives that require access to timely and high-quality information, but as civilization gets more sophisticated, we are starting to realize that we should be careful about what information we let into our brains, just as we are careful about the food we put in our bodies.

In a world like this, there is very little room for ads and noise. Yet this is how the world has functioned for the last 25 years.

With the inevitable advancement of our civilization, it is reasonable to predict that most of humanity is in for a rude awakening from a world in which harmful agendas driven by misaligned incentives dominate our lives. The shock and realization of how information is really important and how we are currently being treated may feel similar to waking from a coma, like the one that controlled humanity in the movie The Matrix. We’ll look at the current situation in hindsight and wonder “How did this all happen?”

In the future, it is likely that if the current mainstream search engines want to survive, they will have to go back to their roots, dismissing ads as their primary business model (as described by Mr. Page and Mr. Brin in their 1998 whitepaper) and start optimizing for what the user wants. This seismic shift is not a matter of if but when. If nothing else, it will be driven by the erosion of public trust in information served by companies using ad-supported business models.

Then, imagine a world in which companies use all their resources, technology, and human potential to create entirely user-centric products. This will drive innovation as yet unseen.

We will have search products (AI assistants by that time) with different capabilities. There will still probably be some “free” ones, ad-supported, which will not return very high-quality information and will optimize for ad revenue instead. They may even have the “for entertainment” label, as found on some “news” sites today.

But there will also be search companions with different abilities offered at different price points. Depending on your budget and tolerance, you will be able to buy beginner, intermediate, or expert AIs. They’ll come with character traits like tact and wit or certain pedigrees, interests, and even adjustable bias. You could customize an AI to be conservative or liberal, sweet or sassy!

In the future, instead of everyone sharing the same search engine, you’ll have your completely individual, personalized Mike or Julia or Jarvis - the AI. Instead of being scared to share information with it, you will volunteer your data, knowing its incentives align with yours. The more you tell your assistant, the better it can help you, so when you ask it to recommend a good restaurant nearby, it’ll provide options based on what you like to eat and how far you want to drive. Ask it for a good coffee maker, and it’ll recommend choices within your budget from your favorite brands with only your best interests in mind. The search will be personal and contextual and excitingly so!

The most sophisticated ones will be able to answer questions requiring them to digest pages of documents, even entire books or videos, to come up with a 200-word summary.

And yes, the non-zero price point will mean you have to budget it with your other costs. But faster access to higher quality information will make you much more competitive globally, so you can decide if the investment will be worth it, like any other purchase you make. This will in turn incentivize these products to be even better, a positive feedback loop driven by entirely aligned incentives.

This is a vision of the future that will finally allow the internet to reach its full potential as the amazing tool it could be rather than the exploitative and one it is now.

I hope you join us on this journey.

Vladimir Prelovac
CEO, Kagi Inc.

Read the whole story
emrox
17 days ago
reply
Hamburg, Germany
Share this story
Delete
Next Page of Stories