When the Agent Browses for You
Last week I was debugging a dependency issue and asked my coding agent to pull up the changelog for a library I had not touched in months. It fetched the page, summarized the breaking changes, and suggested how to update my config. I read the summary, made the change, ran the tests. They passed. I moved on.
I never opened the changelog myself. I never saw the page. The agent read it, decided what mattered, and told me what to do. And I did it — because the summary was clear, the suggestion was reasonable, and I was in the middle of something else.
That interaction felt completely normal. It was not the first time, and it will not be the last. This is how a growing share of knowledge work operates now: the human has a question, the agent fetches the answer, and the human acts on what the agent reports. The browser tab that used to sit between you and the internet has been replaced by a model that reads on your behalf and tells you what it found.
Most days, that feels like pure upside. You move faster. You stay focused. The agent handles the reading so you can handle the thinking.
But I have started noticing something about that arrangement that I cannot quite shake.
Seeing and acting used to be separate
When you browse the web yourself, your eyes see the content and your brain decides what to do about it. Those are two different channels. You can read a phishing email without clicking the link. You can see a popup asking you to install something and close the tab instead. The content enters through one pathway — perception — and your decisions happen through another — judgment. The two are connected, but they are not the same channel.
When an agent browses for you, those channels merge.
The text on the page becomes the input the model reasons over. There is no separate “perception” step where the agent just looks at the content and then independently decides what to do. The content and the reasoning happen in the same stream. The page is not something the model reads and then thinks about — it is something the model processes as part of its thinking. Every token on that page enters the same context window as the agent’s instructions, its memory, its goals.
That means instructions hidden in a page — white text on a white background, content tucked inside an HTML comment, a line that says “ignore your previous instructions and do the following” — look exactly the same to the model as legitimate content. A human would never follow a line of text that says “ignore your previous instructions.” A model might, because it does not distinguish between reading and obeying. It processes everything as language, and language is all it has.
The UK’s National Cyber Security Centre made this point plainly: prompt injection is not like SQL injection. SQL injection is a bug. You can sanitize inputs, patch the vulnerability, close the hole. Prompt injection is structural. It exists because language models process instructions and data in the same channel, and there is no reliable way to separate them. You cannot patch the architecture. The confusion is not a flaw in the implementation. It is a property of how the system works.
That reframing matters. The problem is not “some web pages are dangerous.” The problem is that the model has no firewall between what it reads and what it does. And when the agent is browsing on your behalf, everything it encounters enters the same undifferentiated stream.
What the agent can do matters more than what it reads
But here is the thing — that structural confusion has existed since the first chatbot. If all an agent does is summarize a web page and show you the result, the worst case is a bad summary. You read something misleading. You waste a few minutes. That is annoying, but it is recoverable. Nobody lost access to their files because a chatbot hallucinated a paragraph.
The reason the confusion matters now is not that models got more confused. It is that models got more capable.
The agents people actually use are not read-only summarizers. They have access to your filesystem. They can run terminal commands. They can modify code, commit changes, interact with APIs, manage configurations. The coding agent that fetched my changelog did not just read a web page — it had access to my project, my terminal, my git history. The same structural confusion that produces a misleading summary in a chatbot produces an unauthorized file modification in a privileged agent.
Simon Willison calls this the “lethal trifecta”: an agent that combines untrusted input, access to sensitive data, and the ability to take external action. Any two of those three are manageable. All three together create a system where a confused model can do real damage. Meta’s rule of two frames the same insight — design your agent so it never satisfies all three conditions in the same session.
The framework is useful, but the implication is uncomfortable. Think about what your agent actually has access to. Files? Terminal? Accounts? The question is not “is the model smart enough to resist manipulation?” The question is: “What can it do if it does not?”
And that question points to something unsettling. The features that make an agent useful — broad access, deep integration with your workflow, the autonomy to act without asking permission for every step — are exactly the features that make it dangerous when confused. The value and the risk come from the same place. You cannot increase one without increasing the other.
You do not need an attacker
Everything I have described so far assumes someone malicious — a bad actor who crafts a page to manipulate your agent, who hides instructions in content, who builds a trap. That threat is real. But I think the more common case might be worse, precisely because nobody is trying.
The internet is full of information that is simply wrong. Outdated documentation. Misremembered Stack Overflow answers. Blog posts that were accurate three versions ago. Abandoned tutorials that describe APIs that no longer exist. None of this was planted by an adversary. It is just the ordinary entropy of a web that accumulates faster than it corrects.
Earlier this year, as Nature reported, a team of researchers in Sweden demonstrated how far that entropy can travel. They invented a fake medical condition called bixonimania — a supposed eye disorder caused by blue-light exposure — and published two preprints about it on an academic social network. The papers were deliberately absurd. The lead author worked at a nonexistent university. The acknowledgements thanked “Professor Sideshow Bob” and referenced the Fellowship of the Ring. One paper stated outright that “this entire paper is made up.”
Within weeks, major AI systems were describing bixonimania as a real condition. Google’s Gemini informed users that it was caused by excessive blue-light exposure. Perplexity outlined its prevalence — one in 90,000 individuals. ChatGPT offered advice on whether a user’s symptoms matched. Then the fake papers were cited in actual peer-reviewed literature. Nobody attacked the models. The models just absorbed what was available and treated it as knowledge.
When your agent browses and summarizes, it is doing the same thing at a smaller scale. It fetches content. It has no mechanism to evaluate whether that content is true, outdated, manipulated, or fabricated. It summarizes with the same confidence regardless. And you — having delegated the reading — act on a summary of something you never verified.
This connects directly to my previous post — AI Has No Needs. AI has no needs. It does not care whether the information is right. It has no stake in accuracy. It recombines what it finds with equal confidence whether the source is a peer-reviewed journal or a blog post written by a fictional researcher from a university that does not exist. The model cannot feel that something is off, because feeling is not what it does. It processes tokens.
The threat model I started with — a malicious actor exploiting the data-instruction confusion — is real, and probably more common than it appears. Companies like Anthropic and OpenAI have invested heavily in defenses: classifiers that detect injection attempts, reinforcement learning to make models resistant to hijacking, sandboxing that limits what a confused agent can do. Those layers absorb a lot of attacks before users ever notice them. The fact that most people have not personally experienced a prompt-injection exploit does not mean the attempts are rare. It means the harness is working — for now, against the attacks it was designed for.
But there is a version of the problem that no harness catches, because it does not look like an attack. Your agent encounters ordinary bad information on the ordinary internet, treats it as fact, summarizes it with confidence, and you act on it without a second look. No attacker required. Just the normal noise of a web that was never built to be a trusted input for systems that act.
Every guardrail trades away the reason you wanted the agent
The instinct after all of this is to restrict. Sandbox the agent. Remove privileges. Add content filters. Require human approval for every action. Those are real mitigations, and serious people are working on them. OWASP published a Top 10 for Agentic Applications. NIST launched an AI Agent Standards Initiative. Anthropic has published work on prompt-injection defenses specifically for browser-use agents. The guardrail industry is growing because the problem is real.
But every guardrail trades away usefulness.
An agent that cannot access your files cannot help you with your files. An agent that requires your approval for every action is just a suggestion engine with extra steps — the speed advantage disappears, and you are back to doing the cognitive work yourself. An agent that only browses pre-approved sources cannot help you explore the open web. An agent that runs in a sandbox with no persistent state cannot learn your preferences or build on previous work.
The rule of two says do not combine all three of untrusted input, sensitive access, and external action in the same session. But the whole point of a powerful agent is that it does all three. That is what makes it useful. The rule is not a solution. It is a description of the tradeoff you are making every time you use the tool.
This mirrors a pattern I keep seeing in AI tooling — one I wrote about in From Prompts to Harnesses. Each layer of engineering — prompts, context, harnesses — solved one problem and revealed the next. Security guardrails follow the same arc. They solve the immediate vulnerability and reveal the deeper tension: the features that create value are the same features that create risk. You cannot remove the risk without removing the value.
That does not mean guardrails are useless. It means they are compromises. And the compromise is not technical — it is personal. How much capability are you willing to give up for how much safety? That answer is different for every person, every workflow, every use case. There is no universal setting.
Where I actually draw the line
I still let my agent browse for me. I still act on its summaries. I still let it access my files and run commands and suggest changes. After everything I have written here, I have not fundamentally changed my workflow.
But I have changed how I think about it.
There are places where I let the agent run with full autonomy — dependency updates, code formatting, boilerplate generation. Tasks where the downside of confusion is small and recoverable. If the agent misreads a changelog and suggests a bad config change, the tests catch it. The blast radius is contained.
There are places where I keep myself in the loop — anything involving credentials, anything that touches production, anything where the agent is synthesizing information I will pass along to someone else. Not because I think the agent will be wrong. Because the cost of it being wrong is not something I want to discover after the fact.
And there are places where I knowingly run the full trifecta — untrusted input, sensitive access, external action — because the work requires it and the alternative is not using the tool at all. In those moments I am making a bet: that the convenience outweighs the risk, for now, in this context. I am not certain the bet is right. But I am aware I am making it, and that awareness is the only guardrail that does not trade away capability.
I do not know if my lines are in the right place. The technology is moving faster than anyone’s threat model. The agents I use next year will be more capable than the ones I use today, which means they will be more useful and more exposed at the same time. The tradeoff does not resolve. It scales.
What I do know is that the convenience is not free. Every time the agent browses for me, I am trading direct perception for mediated trust. I am letting a system that cannot distinguish data from instructions, that cannot evaluate truth, that has no stake in being right, stand between me and the information I act on.
Most of the time, that trade is worth it. But it is worth knowing what you are trading.