An Introduction to Extreme Normality

Understanding power law distributions, and their implications for decision-making.

Oct 06, 2023

We’re taught about normal/Gaussian distributions (i.e. ‘bell curves’) in high school statistics. Regardless of whether you work with it formally as a statistician or merely use it subconsciously as a rough model of the world, it’s an extremely useful idea to have in one’s arsenal. Unfortunately, as we’ll see, it fails catastrophically when used in situations where it doesn’t apply.

The internet, big data, image generation - increasingly everyday, but decidedly abnormal.

Classic examples of approximately normal distributions include: the heights of children in a classroom, their reading ability, the number of corn kernels on the cobs from a field, and Galton Boards. Anywhere we’re modelling a set of independent first-order variables, they’ll probably be approximately normally distributed.

We don’t expect perfect curves, and are generally aware the distribution can often be distorted. Perhaps something we’re measuring can’t drop below zero, leading to a rightward skew, or a limit of some kind is placed on the distribution. Sampling the heights of children for a theme-park ride will also be skewed rightward by the minimum height to enter. When we look at the 100m sprint times of two different classrooms, we’re not too surprised when the combined distribution has two distinct peaks if one of the teachers was a former champion sprinter. However, in most cases, unless we’re specifically interested in conducting a detailed analysis, we’ve good reasons to approximate these situations in terms of a normal distribution.

Normality is everywhere, it seems. Except when it isn’t.

Consider book sales. Estimates vary as there’s no firm grasp of exactly how many books are published1, but from what I can tell the average number of copies a book sells over its lifetime is about 100, while 90th percentile is about 200 copies. Assuming sales are normally distributed, we can estimate a book would sell 1000 copies (9 std deviations from the mean) roughly once every 45 trillion years.

Funnily enough, because these ‘outliers’ are so rare, if we used our model to estimate book sales, the overwhelming majority of the time results would be within expectations. We could even diligently update our model to account for the thousand-copies-sold book by adjusting our estimate of the mean, and perhaps increasing our uncertainty of the distribution’s variance. It's easy to fool ourselves we've appropriately adjusted our models. Eventually though, we’d come across a copy of Harry Potter (~500 million copies sold) that simply cannot be accounted for. Hopefully, we weren’t basing critical decisions on our sales model! Many financial collapses have been caused by reliance on highly calibrated sophisticated models based on the assumption of normality in a non-normal world.

What we’ve come across is an example of a power law distribution.

Power law distributions arise where an outcome results from the multiplicative effect of contributing factors. Let’s imagine the number of copies a book sells is some combination of the writing ability of the author, how well the topic is aligned with the zeitgeist, the reach of the publicist, etc. Each of these factors individually might be normally distributed, but the results of their interaction meant that 0.5% of the 6.5b books sold last decade were Fifty Shades of Grey.

Variations of the power law distribution with different shape values. As α increases, more of the total value is concentrated in fewer participants.

The exact distribution of power laws varies, but the surprisingly robust rule of thumb here is the ‘Pareto Principle’ that 80% of the outcomes are due to 20% of the causes. The principle applies where we see outcomes resulting from these multiplicatives. 80% of storm damage comes from 20% of storms. 80% of software complaints come from 20% of codebase errors. 80% of cost overruns result from 20% of the most complex projects.

Note that the distribution is also self-similar. Roughly 80% x 80% = 64% of software sales come from 20% x 20% = 4% of customers. If there were a million software developers in the world, 17% of all productivity would come from the top three2.

Mediocristan is the linear, limited difference, expectable realm of human height and running speeds. Extremistan is the unexpectable, Black Swan world of financial markets, book sales, and death by terrorism.
Mediocristan is where we must endure the tyranny of the collective, the routine, the obvious, and the predicted; Extremistan is where we are subjected to the tyranny of the singular, the accidental, the unseen, and the unpredicted.
-Excerpts from ‘The Black Swan’, by Nassim Nicholas Taleb

Whether you’re an analyst making models or a manager making qualitative decisions, determining whether what you’re working on is normal or extreme turns out to be critically important.

A ‘normal’ project faces diminishing returns for extra effort and encourages a ‘near enough is good enough’ attitude. Conversely, the value of a project governed by the Pareto principle is all out at the extremes, where even 1% additional care and attention will be disproportionately rewarded.

Normal failures are routine, manageable, and mitigateable by process. Extreme failures are not necessarily any more likely, but are prone to be catastrophic in some unique and unforeseen way. Rigid techniques adapted for normal failure modes often lack the flexibility to respond effectively to the extreme.

Deepwater Horizon aflame, April 2010. A normal approach to risk management resulted in the deaths of 11 crew, and 4.9 million barrels of crude oil spilling into the Gulf of Mexico.

Humans have evolved to live and navigate the world of the mostly-normal. Everything from the size of antelopes to the abilities of our fellow humans would conform to these expectations, which inform much of our thinking of the world, from our approach to risk to our intuitions of fairness3.

With a few significant exceptions4, the upsides of our ancestral environment were normally distributed. The downsides mostly were too, but with the occasional unpredictable extreme consequence. When the consequence of going out on that branch might mean 10% more fruit, but a non-zero chance of death, then it’s perhaps no surprise that we would develop systemic loss aversion and a preference for predictability.

As society has become more interconnected, capable, and complex, we’re moving increasingly into a world where many ‘normal’ everyday outcomes are governed not by the Gaussian, but rather by the extreme distributions of power laws. We need to update our thinking accordingly if we’re to navigate it effectively.

Or indeed, what exactly counts as ‘'published’ and ‘book’, the data also varies considerably whether we’re looking at self-published or not. I’m going here from Amazon’s data, and assuming roughly half of all book sales are via that platform.

Consider Linus Torvalds, the creator of Linux. The Linux kernel he wrote is the foundation of Android, installed on 70% of all mobile devices globally.

This misapprehension of power laws can lead many to assume that extreme financial success can only be achieved ‘unfairly’ for example, when in fact it should be the expected outcome of delivering value in extreme fields (e.g. publishing, software, finance, competitive sports). Public policy should respond accordingly, rather than attempting to prevent such outcomes.

Normal vs Extreme returns are one reason why, particularly in K-selected species like ours, males tend to have a much higher tolerance for risk than females, since genetic ‘returns’ for the former follow power laws, while being more normally distributed for the latter.

Public Service

Discussion about this post