Bot Traffic Is a Business Decision, Not Just an Infrastructure Problem
A competitor recently published an interesting analysis of their platform's bot traffic. We'd encourage you to read it. Their data is real, their methodology is transparent, and their pushback on bot-traffic hysteria is genuinely useful. They're right that "99% of traffic is malicious bots" is FUD. They're right that most bot traffic is identifiable and largely legitimate. We don't disagree with any of that.
Where we part ways is in the conclusion they draw from it. Their answer to the bot traffic challenge is a pricing model: discount all crawlers, pre-exclude identified bots from billing, and frame that as alignment between price and customer value. It's a sensible response to one real problem: surprise overage charges from unpredictable crawler traffic. Nobody likes those.
But here's what a pricing model can't do. It can't tell you which bots are scraping your product catalog for a competitor's AI tool. It can't stop a training crawler from ingesting your proprietary content. It can't protect your search endpoints from crawl loops that exhaust your database and degrade the experience for real users. Discounting all bot traffic treats every crawler the same: the security scanner you want, the content scraper you don't, and the AI training crawler you may never have agreed to feed.
We've also seen vendors attribute customer-impacting outages to "bot traffic" at request volumes that wouldn't strain a development environment, when the actual root cause was infrastructure failure on the platform side. That's the risk of treating bot traffic as a billing line item rather than an operational signal: when something breaks, you lose the visibility to know what actually caused it.
The question worth asking isn't "how do we make bot traffic cheaper?" It's "do we actually have control over which bots access our site, and what they do when they get there?" Those are fundamentally different problems, and only one of them requires a strategy.
The bot landscape has genuinely changed
The measured view is right on one thing: the hysteria around bot traffic is often counterproductive. The web is not under siege. Most automated traffic comes from identifiable, legitimate sources: search crawlers, uptime monitors, security scanners, and increasingly, AI assistants making requests as part of real human workflows. That last category is genuinely new and genuinely interesting.
But "most bot traffic is legitimate" and "all bot traffic deserves equal access" are not the same claim. The landscape has changed in ways that matter strategically, even if the raw numbers are less alarming than some headlines suggest.
It's also worth noting that bot traffic figures vary significantly depending on how and where they're measured. Platforms that filter traffic through a WAF before measurement will report materially lower bot rates than raw server-side data reflects. Industry estimates from infrastructure providers handling the majority of global web traffic put automated traffic at roughly 42% of all requests over the 12 months ending August 2025 (Needham and Co.). Your own figure will depend on your traffic profile, industry, and whether your current tooling measures before or after mitigation.
Current metrics and implications:
~42% of web traffic is automated (Needham and Co., 12 months ending Aug 2025)
What this means: Industry-wide figure from infrastructure providers handling the majority of global traffic. Pre-WAF figures at the origin are typically higher.A growing pack of named crawlers now active around the clock: Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, Bytespider, Common Crawl, and more
What this means: Where there used to be one dominant search crawler, there is now a competitive field, each with different intentions, different disclosure practices, and different impacts on your business.~60% of searches end without a click (SparkToro / Datos, 2024)
What this means: AI-generated answers and on-SERP features are suppressing click-through rates. Being indexed by the right crawlers matters more than ever; being scraped by the wrong ones costs more than ever.
What's new isn't the volume. It's the intent. Google crawling your site to surface it in search results is categorically different from an AI lab crawling it to train a model that may compete with you, or a scraper harvesting your pricing data for a competitor's intelligence tool. When vendors offer blanket billing discounts across all these crawlers equally, they are, in effect, asking you to subsidize your own content being ingested by future competitive AI products. That's not a pricing decision. It's a strategic one, and it deserves to be treated as such.
"Identifying a bot is the beginning of the analysis, not the end of it. The real question is whether that bot's access serves your business, and whether you have the tools to act on that judgment."
What a pricing model solves, and what it doesn't
To be fair: surprise overage charges from crawler traffic are a legitimate and frustrating problem. Being charged for your own security scanner's requests is genuinely absurd. Vendors who pre-exclude identified bot traffic from billing are solving a real pain point, and customers caught in those situations deserve better.
But a billing policy is not a security posture. Excluding bots from your invoice doesn't exclude them from your infrastructure. It doesn't prevent content scraping. It doesn't protect your search endpoints from volumetric abuse. It doesn't give your team visibility into which new crawlers appeared this week or let you make a considered decision about whether to allow them.
What it does is remove the financial sting, which makes the problem easier to ignore. And for enterprise organizations managing proprietary content, competitive data, and complex regulatory environments, ignoring the problem is not actually an option.
It's also worth acknowledging that not every organization needs the same level of bot management sophistication. Smaller sites with lower traffic complexity have different risk profiles than enterprise properties managing large content catalogs, sensitive user data, or proprietary pricing. The right approach scales with your exposure. But the strategic question, which bots should have access to my site and why, is worth asking at any size. And on the question of investment: for most organizations, the cost of one avoided security incident, one prevented scraping event, or one recovered engineering sprint will comfortably outweigh the annual cost of the tooling that made it possible.
A framework for strategic bot management
The goal isn't to block all bots. That would cut off Googlebot, directly harmful to SEO, and the uptime monitors that keep your SLAs honest. The goal is differentiated access based on business value.
Here are four steps and what they mean in practice:
Identify: Visibility comes first. Which bots are accessing your site? Are they declaring themselves honestly, or rotating identities to evade detection? A modern WAF surfaces this data continuously, not just when something breaks.
Classify: Classification goes beyond the User-Agent string to IP reputation, behavioral patterns, geolocation, and robots.txt compliance. Does this crawler's purpose benefit your business? Honest identification is necessary but not sufficient; intent matters too. Ask specifically: is this a bad actor scraping content, or a beneficial AI discovery crawler you want indexing your brand?
Act: Action should match risk. Verified beneficial crawlers get full access. Unknown bots get challenged or rate-limited while you gather more signal. Bad actors get blocked at the edge, before they consume origin resources or extract content.
Adapt: The landscape is not static. New crawlers appear weekly, and the intentions behind them evolve. Effective bot management requires ongoing monitoring and rule updates, not a one-time configuration.
What happens without bot management, and what changes when you have it
Strategic bot management is a measurable financial decision, not just a security posture. To make the case concrete, it helps to be specific about what the "before" state actually looks like, and what changes after.
Without strategic bot management
- Infrastructure overhead: Unfiltered bad bot traffic (malicious scrapers, credential stuffers, crawl loops on search endpoints) hits your origin server directly, consuming compute, bandwidth, and CDN capacity you're paying for but not benefiting from. Industry estimates put bad bot traffic at 25-40% of total requests on enterprise sites.
- Security incidents: Without edge-level detection, bot-driven attacks require reactive human response. Security teams spend significant hours per month identifying, triaging, and remediating incidents that a WAF would have stopped before they started.
- Operational drag: Engineering time spent on bot remediation is engineering time not spent on product, performance, or customer experience. At enterprise scale, this compounds significantly over a 12-36 month horizon.
- Strategic exposure: AI training crawlers and competitive scrapers access your content unchallenged, ingesting your catalog, pricing, and proprietary data into systems you have no visibility into and no control over.
With Acquia Edge bot management
- Bad bot traffic is identified and blocked at the edge, before it reaches your origin. Infrastructure costs associated with bot-driven load drop materially.
- Security incidents driven by automated attacks are intercepted before they require human response, reducing remediation hours and incident risk.
- Engineering teams reclaim time previously spent on bot firefighting.
- You have an active, configurable policy governing which crawlers access your site, including the ability to welcome beneficial AI discovery crawlers while blocking training scrapers or competitive intelligence tools.
Where the value shows up: a 3-year view
Infrastructure load from bad bots (Stakeholder: VP Infrastructure)
- Before: 25-40% of origin requests are low-value or malicious traffic you're paying to serve.
- After Acquia Edge: Bad bots blocked at the edge; origin load drops; bandwidth and CDN costs reduced materially.
- Business impact: Significant reduction in infrastructure spend over a 3-year horizon.
Security incident remediation (Stakeholder: CISO)
- Before: Security team spending significant hours per month on bot-driven incidents, with ongoing risk of a major breach or outage.
- After Acquia Edge: Edge detection intercepts attacks before human response is required; incident rate and severity drop.
- Business impact: Largest single value driver — one avoided major incident can outweigh years of tooling cost.
Engineering time on bot firefighting (Stakeholder: Engineering / CMO)
- Before: Developer cycles diverted to reactive bot remediation rather than product and experience work.
- After Acquia Edge: Cycles reclaimed; teams focus on product, performance, and customer experience.
- Business impact: Compounding productivity gain across a 12-36 month horizon.
The relative weight of each category will vary by organization. For most enterprise sites, risk and compliance tends to be the largest value driver; infrastructure savings are the most immediately visible. Talk to your Acquia team to model the impact against your specific environment and traffic profile.
The other side of the equation: good bots are your opportunity
Strategic bot management isn't only about keeping bad actors out. It's equally about ensuring the right crawlers get in.
Roughly 60% of searches now end without a click (SparkToro / Datos, 2024), driven by AI Overviews and on-SERP answers. The implication isn't that discovery is dead; it's that the discovery surface has expanded. Googlebot still matters. But so does GPTBot, ClaudeBot, PerplexityBot, and the growing ecosystem of AI assistants that now answer questions by drawing on indexed web content.
If your content is well-structured, authoritative, and accessible to these crawlers, you have a real opportunity to appear in AI-generated answers, not just the source behind a click that never comes. That's the case for actively welcoming beneficial AI discovery crawlers, while maintaining the controls to block the ones whose purposes don't align with your interests.
This is the strategic middle ground that a WAF enables and that a blunt "discount everything" approach cannot: differentiated access, by crawler, by intent, by business value.
Talk to your Acquia AM, CSM, or TAM about your WAF and bot management strategy. We'll help you understand your actual bot traffic profile, which crawlers are accessing your site today, and how to build a policy that protects your content while positioning you for AI-driven discovery.