Automated Bots and Search URLs

Managing Automated Bots and Search URLs on Acquia Cloud

November 20, 2025 3 minute read

AI crawlers are hungry for data, but website owners are barring them from internal /search pages. Learn the critical reasons why—from protecting server resources and preventing SEO dilution to securing private user data and blocking competitive scraping.

line drawing of a lock and password field

Collection :

Drupal

We are seeing a major influx across the web of LLM assistants, AI agents, data scrapers, and other sources of bot traffic, Whether or not the traffic is considered valuable or malicious, these bots will try to access your web applications. There are several practical reasons for blocking them from crawling your sites' search URLs (e.g., /search?q=…). Here are the most common:

Protecting Server Resources

High resource cost: Every AI bot request can trigger a database query or full-text index lookup. Unchecked, these crawls can spike CPU and database load, slowing the site for real users.
Crawl loops: Search URLs can generate infinite variations (?q=a, ?q=b, etc.). A bot might follow and index all of them, creating an endless crawl that burns bandwidth and database cycles.
Search service exhaustion: Some search services have query limits. Bots can quickly exhaust these, leading to outages or overage charges.

SEO and Content Quality

Duplicate content: Search result pages typically contain snippets of content already available on product or article pages. Indexing them can dilute SEO signals and clutter search-engine indexes.
Low-quality pages: Many search result pages aren’t meaningful landing pages. Blocking them helps keep only valuable, well-curated URLs in public indexes.

Privacy, Security, and Compliance

Sensitive queries: Site searches sometimes reveal patterns of user interest or internal keywords that shouldn’t be exposed to third parties.
Data ownership: Some owners prefer not to let AI training sets include their internal search results or user-generated queries.

Reduce Scraping, Competitor Insight

Competitive intelligence: Search URLs can expose product availability, pricing, or trends that competitors might mine.
Content theft: Blocking bots makes it harder for automated tools to copy a site’s full content catalog through search listings.

How It’s Done

Start by knowing good bots from questionable and/or bad bots. Then make a plan. For starters, you might add to the robots.txt the following to manage GPTBot by specifying the user agent, or all bots using the *:

# Don't allow GPTBot to your search URLs.
User-agent: GPTBot
Disallow: /search

# Don't allow any bots to access your search URLs.
User-agent: *
Disallow: /search

This approach is limited to good bots that will respect robots.txt. Your team may augment your strategy and use WAF/bot-management tools (Acquia Edge, Cloudflare, or Akamai) for a more robust approach to manage, block or rate-limit AI bots.

Bottom line: Search URLs are often dynamically generated, resource-intensive, and low value for indexing, so many site owners proactively block AI and other non-human crawlers from them. Then focus on optimizing your landing pages for the highest quality content for your organization.

Greg O'Toole

Web Technologist, Drupal Builder & Open Source Advocate Acquia

Greg, a veteran of web technology since 1994, has helped countless enterprise customers align their business goals with strategic web solutions. His expertise in digital transformation helps companies deliver digital experiences that drive results.

Collection