404: Human Not Found

Posted 2025-03-27 # General

I often get asked about how I expect AI to reshape our lives in the coming decades. While I don’t know anything with certainty, I’m seeing small breadcrumbs of the massive transformation ahead.

One of the most fascinating shifts is in the window you’re staring at right now: it’s who — or what — the internet is built for and the way we interact with it (via the browser). The internet was originally created to connect computers and let humans share information in standardized ways (namely HTTP). These connections turned out to be revolutionary — they created new business models and took down old ones. Now we’re witnessing something equally profound: the transition from a human-first to a bot-first internet. The full implications are not immediately clear, but they could be equally impactful to society.

The story starts with web crawlers, the internet’s first bots, that mapped out every corner of the web to make information searchable. Website admins quickly realized they needed some control over this and came up with robots.txt — a simple way to tell bots which parts of their sites were off-limits. Adherence was voluntary: while reputable players like search engines mostly played by these rules, others didn’t bother.

As search made the web more accessible and PageRank morphed into SEO, optimizing engagement through UIs took center stage and web testing frameworks grew alongside. A pivotal moment was in 2004 when Jason Huggins created JavaScriptTestRunner, which was open-sourced as Selenium Core and injected JavaScript into pages to simulate user actions. Soon after, Simon Stewart introduced WebDriver, which natively controlled the browser and provided a higher-level API. Later, the two combined into Selenium WebDriver, which became the de facto standard for browser automation testing until 2020, when Microsoft open-sourced Playwright.

As web automation tooling grew in testing use cases across a growing landscape of websites and browsers, it supported scale in another market — malicious bots spamming forums, flooding inboxes, and harvesting data. These days, bots are behind 77% of online security incidents. We’ve fought back with reCAPTCHAs, behavior tracking, and rate limits. Most websites now treat non-human traffic as guilty until proven innocent — and if you’ve seen spaces like X lately, you might agree that they’ve had good reason to.

But LLMs are eating away at this regime. These new AI-powered bots (okay fine, agents!) can navigate the web with flexible goals, almost like humans. I see three avenues LLMs can take for web-based automation: multimodal point-and-click automation, navigating the DOM through LLMs, and making direct API calls.

Multimodal point-and-click. The most intuitive approach to web automation would be to just replicate how humans interact with the web (look at a page, look for an action that matches my objective and click!). Systems like Operator and Computer use take screenshots that are sent to a model that instructs the next action. In its current form, I don’t have much faith that this will be the winning solution for a few reasons. First, processing images is compute-intensive and is compounded by the fact that we require a headful browser to render pages. Second, we’re not guaranteed to get the full set of actions when the page is loaded. How many of the actions sent to an LLM will just be scroll down… okay now… scroll up? This is of course great for generating tokens for your foundation model business, but not so great for scalable application-layer use cases! The bright spot here is that it implies existing UI patterns will suffice for AI, and there is hope if we can train models to read and interact in a compressed image space to save compute.

Navigate the DOM with LLMs. Another option is to use LLM-based code generation to write Puppeteer or Playwright scripts to accomplish tasks. With services like Browserbase, Steel, and Browser Use, I can supply a flexible goal to an LLM, which will look at the site’s DOM and generate the script on the fly. I like this approach for two reasons. First, brittleness is why most automations fail — the button moved or the URL path changed. Even the best scripting frameworks will require too precise a specification and will result in hours of maintenance when things break. Natural language allows richer flexibility and allows for orientation around goals rather than process (the what rather than the how). Second, these can be run in headless browser instances suitable for larger-scale cloud deployments and generally work well with the existing automation infrastructure.

Make direct API calls. A final way to automate the web is by skipping the browser altogether and directly calling APIs that generate the information on a page. I’m sure someone somewhere has already said “APIs are eating the world”, but AI is enabling API integrations at an accelerating pace. Sites with APIs that don’t want to spend time developing and maintaining SDKs can use AI tools like Stainless to auto-generate them based on an OpenAPI spec. And when sites don’t have public APIs, services like Candle and Integuru trace requests and use AI to reverse engineer internal APIs into SDKs.

As LLMs further influence how we interact with the internet, there are two interesting implications.

The litmus test for unwanted traffic will change. It seems likely that many of us will be okay with a bot taking actions on our behalf on the web. But for that to happen, we will need to have a way to both authenticate and authorize these bots, especially as their traffic becomes indistinguishable from human traffic. I don’t expect this to be a passive protocol like robots.txt, but it could be a sort of accreditation for legitimate traffic — like a license plate that interacts with some combination of cars (browser automation platforms), toll booths (web auth like Okta), and roads (network services like Cloudflare).

Websites will look different as they become dually optimized for human and bot use. One of the reasons I struggle with the multimodal point-and-click approach is that it limits models to conform to how humans traverse the internet. It feels like the “Waymo stage” of web automation, where our human-readable UIs are steering wheels that surely will be replaced by more information-dense bot-first interfaces. If I take each web automation approach above to its extreme, it leads to roughly the same place — LLMs that parse a large block of dynamic information and serve that information with a UI back to users. In other words, the models themselves eventually become the browsers we interact through.

What makes this moment so thrilling is that web automation — once the domain of QA testers and spammers — is evolving into something magical. With AI at the helm, we’re not just scripting clicks; we’re teaching software to problem-solve, navigate, and act on our behalf. The browser, once a pane of glass between us and the internet, may soon become a conduit for intelligent agents that understand our goals and reshape how we experience the web entirely. It’s a small slice of a much bigger shift — one where AI doesn’t just augment our lives, but rewires the very interfaces we use to live them.