The Average of the Public Web

A few weeks ago I asked an AI a question I already knew the answer to. The response came back cleanly formatted, well organized, and confident. It was technically correct. It was also not what I would have told someone else working on the same thing.

Nothing the model said was wrong. The answer was the median of what the internet says about the thing I was asking about, retrieved and reshaped at impressive speed. I happened to know more than the median. The model could not have known that, because nothing in its training data told it where my knowledge ended and its answer began.

That moment sat with me longer than it should have. Not because the answer was bad. It was fine. Because I recognized the shape of the gap between the two.

This is not new

The gap between what the model answered and what I knew is not a new phenomenon. It is the newest instance of the oldest pattern in how knowledge creates value. Private knowledge, what some people know and others do not, has always been where value lives. The public web is not an exception to that rule. The public web is downstream of it.

The same shape at every scale

The shape is visible everywhere you look for it. A competitive company’s moat is always some form of private knowledge: customer data nobody else sees, processes refined by years of quiet failure, relationships that compound. Publish the playbook, leak the data, teach the techniques to the industry, and the company dissolves back into the market average. The moat is, definitionally, whatever the public does not have access to.

The same logic holds at the individual scale. An expert is an expert not because of what they have in common with the rest of their field, but because of what they know that others do not. Expertise is the distance above the baseline. An irreplaceable technician is irreplaceable not for the skill itself, which is teachable, but for the specific combination of things they know that would be too costly to extract and transfer. Different subjects, same rule: value lives in what the public does not have.

AI sits inside the same pattern, but it plays a different role in it than people usually assume. The common framing is that AI is a new source of knowledge. That is not quite right. AI is trained on the public internet, and the public internet is, by construction, the knowledge that people judged worth publicly sharing. The knowledge that was not worth publicly sharing, because it was sensitive, or competitive, or contextual, or simply not yet articulated, is still worth having. It is just not in the training set. Which means AI is not a new source of private knowledge. It is a commoditizer of public knowledge: pooling, synthesizing, and returning it at a scale no individual person could.

That synthesis is not nothing. Cross-referencing a million sources in seconds is a real capability, and in domains where the public web contains most of what matters, AI approaches or exceeds what a working human could do unaided. But the substrate has not moved. It is still the public web. It is still the average, weighted by volume, of what people have chosen to make public.

AI raises the floor, not the ceiling

That reframing tells you precisely what AI changes and what it does not.

What it changes is access to the floor. General public knowledge used to take years of study to synthesize. You had to read the textbook, then the five papers the textbook cited, then the argument between those papers. You had to find someone in the field and ask them. That synthesis was expensive. Now it is a prompt away. The floor of what any motivated person can do is much higher than it was two years ago, and that is not a minor thing. It is the actual shift.

What it does not change is the ceiling. The ceiling was never on the internet to begin with. The doctor who recognizes that this patient is off before the vitals confirm it, the engineer who glances at a stack trace and knows instantly which config is wrong, the trader whose gut has priced in something the charts have not: none of that is in a training set, because none of it was ever written down in a form a model could learn from.

So the gap between public-knowledge value and private-knowledge value is the only meaningful one left. Below the ceiling, AI eats the work. Above it, people remain necessary, not because AI is bad at their jobs, but because AI cannot see what they see.

The ceiling is being bought out in hourly increments

The ceiling is not as static as the last section implied, though. The AI industry is aware of it. It knows, precisely, that public knowledge is not enough. And its response has been to go buy the private knowledge directly, by the hour.

That is what the data labeling economy actually is. When a company pays radiologists to annotate scans, or lawyers to mark up contracts, or software engineers to rank model outputs on specific coding tasks, it is not really buying labels. It is extracting expert judgment. The expert is paid once to transfer a bit of what they know into a dataset. The dataset then trains the model. What was private knowledge becomes part of the floor.

This is, in some ways, the most candid position the industry has about the ceiling: the ceiling is a wall, and you can chip through it with money.

It is also a strange new market: a market for your own obsolescence, if you happen to be the expert. The first wave of specialists in any labeled domain gets paid per hour. The model then absorbs their labeled judgment and becomes cheaper than them for that specific class of work. The domain does not disappear. But the relative value of the expert’s knowledge, in that domain, compresses toward the new floor.

How much of expert knowledge is truly labelable is an open question I do not know the answer to. My guess is that a surprising amount of any given specialty is: anything that can be framed as “show this expert ten thousand examples and watch them decide.” The labelable parts are probably not the parts experts themselves most value, because the parts experts most value are the parts that resist being reduced to case decisions.

What labeling cannot reach

Because not everything is labelable. The ceiling moves in some domains and holds in others, and the durable moats are in the domains where it holds.

Real-time context cannot be extracted. What to say in this specific meeting with this specific person right now has never happened before and will not happen again in the same shape. Extraction needs repetition, and real-time context is the opposite of repetition. The same barrier applies at longer timescales too: by the time a dataset on last year’s best practices is cleaned, trained on, and deployed, the field has moved. Fast-moving domains outrun their own extraction pipelines.

Confidential data never reaches the labeling pipeline in the first place. Companies are not going to hand their most sensitive customer records to a labeling contractor so an expert can annotate them for a model. Competitive and legal barriers keep the data inside the walls, and the expertise built on top of that data stays stuck inside too.

Relational knowledge does not generalize out of the specific humans involved. The trust you have built with a specific customer, the history you have with a specific coworker, the shape of a team that has worked together for years: none of it transfers, because none of it exists outside the relationship.

And some of what experts know resists articulation even by the experts themselves. A senior engineer who looks at a stack trace and knows instantly what is wrong may not be able to tell you why, only that they know. Labeling can force-articulate some of what lives below articulation. But only some.

This list is a guess, not settled truth. Parts of it could become labelable tomorrow, and in several years the picture will look different. But the structural point holds: labeling is an extraction technology, and extraction technologies always have domains they cannot reach. Those domains are where the durable value lives.

Where does my own work live

When I asked the AI that question a few weeks ago and got the median answer, I was not annoyed. The model was not failing. It was doing exactly what it is built to do: return the average of the public web.

What I was noticing, without having language for it yet, was the edge of a very old market. The line between the answer I got and the answer I would have given is the same line that makes companies valuable and experts valuable and technicians irreplaceable. It separates what the public knows from what I happened to know. For a second I was standing directly on that line.

Everything I do now sits on one side of that line or the other. Most of my work, honestly, is on the public side. I read the same docs everyone reads. I copy-paste patterns the community has settled on. I ask AI for starter code, and the starter code is almost always fine. The floor has come up, and I am benefiting from it.

But the parts of my work that will matter most over time, the instincts I am still building about which product decisions are right, the specific tacit knowledge about a codebase and a team and a company that I am a new hire inside of, the judgment that will only exist after I have shipped enough things to learn what hurts, those parts will have to live above the line. They are private by construction. They are not going to be on the public web, because I am the one living through the experience that would generate them. There is nowhere else for them to come from.

I do not know yet how much of my work is going to land above the line and how much below it. I suspect most new knowledge workers do not know, and will not know for a long time. The line itself is moving as the labeling economy extracts more of the ceiling. The most honest thing I can say right now is that the question of where my work lives is the real question, and I am only starting to ask it.