Web Traffic
AI Crawlers — HTML Requests
4.2%
Source: Cloudflare 2025 Year in Review As of: 2025

AI crawlers: 4.2% of HTML requests, 300% YoY growth (Cloudflare 2025).

What it measures

AI crawlers are bots operated by AI companies to collect training data, build knowledge bases, or power RAG (Retrieval-Augmented Generation) systems. Cloudflare measures this at the network layer across ~20% of all web traffic, giving unique visibility at scale.

The 4.2% figure refers specifically to HTML document requests — JavaScript bundles, images, and API calls are excluded. This makes it a cleaner proxy for "content harvesting" activity vs general bot traffic.

Notable AI crawlers

GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended, PerplexityBot, Meta-ExternalAgent, CCBot (Common Crawl), Bytespider (ByteDance), and dozens of smaller AI lab crawlers. See Known AI Crawlers for the full reference list.

Why humans should care

At 4.2% and growing 300% YoY, AI crawlers are becoming a material bandwidth cost for publishers. More critically, your content is actively training AI systems that may compete with your business — a fundamental shift in the economics of publishing on the open web.

Revenue without reciprocity

Search engines index to refer traffic back to you. AI systems crawl to train models or power AI answers that may replace the click rather than drive it. At 4.2% of HTML requests, this is no longer a hypothetical — it's a current bandwidth line item.

What happens next

AI crawlers are the fastest-growing category of web traffic, up 300% YoY. Every new AI product that needs a knowledge base ships a persistent crawler, and agentic AI systems that browse on user behalf multiply requests per human session. The inflection point comes when crawler traffic drives meaningful referral — or when it clearly doesn't, triggering industry-wide blocking.

Pros — Benefits

Cons — Risks

What to watch for

Most critical tipping point

Conservative
10%
~2028
Regulatory frameworks require compensation; growth slows.
Baseline
15%
~2027
Every AI product ships persistent RAG crawlers.
Aggressive
25%
~2026
Agentic AI systems browse continuously on user behalf.

What you can do

  • Add GPTBot, ClaudeBot, Bytespider to robots.txt if you want to block AI training
  • Check Cloudflare Radar to see how many AI crawlers visit your domain
  • Monitor server bandwidth monthly for AI crawler cost spikes
  • Audit robots.txt AI crawler directives — be explicit, not implicit
  • Enable Cloudflare AI Bot Management (free tier) to block or challenge crawlers
  • Track referral traffic from AI platforms (Perplexity, ChatGPT) to measure reciprocity
  • Consider content licensing programs if your content is high-value training data
  • Support mandatory robots.txt compliance legislation for AI companies
  • Advocate for a web content compensation fund tied to AI training revenue
  • Fund standards work on AI crawler identification and disclosure (W3C, IETF)

Data & methodology

Source
Cloudflare 2025 Year in Review
Coverage
~20% of web traffic processed by Cloudflare network
Metric
AI crawler share of HTML document requests (not total HTTP requests)
Update cadence
Annual — Cloudflare Year in Review (December)
Dashboard anchor
Live stat on dashboard

Related stats