line drawing of a lock and password field

Collection

Drupal
Own the Bots with Acquia Edge
Website Operations

Managing Automated Bots and Search URLs on Acquia Cloud

November 6, 2025 3 minute read
AI crawlers are hungry for data, but website owners are barring them from internal /search pages. Learn the critical reasons why—from protecting server resources and preventing SEO dilution to securing private user data and blocking competitive scraping.
line drawing of a lock and password field

Collection :

Drupal

We are seeing a major influx across the web of LLM assistants, AI agents, data scrapers, and other  sources of bot traffic, Whether or not the traffic is considered  valuable  or malicious, these bots will try to access your web applications. There are several practical reasons for blocking them from crawling your sites' search URLs (e.g., /search?q=…). Here are the most common:

Protecting Server Resources

  • High resource cost: Every AI bot request can trigger a database query or full-text index lookup. Unchecked, these crawls can spike CPU and database load, slowing the site for real users.
  • Crawl loops: Search URLs can generate infinite variations (?q=a, ?q=b, etc.). A bot might follow and index all of them, creating an endless crawl that burns bandwidth and database cycles.
  • Search service exhaustion:  Some search services have query limits. Bots can quickly exhaust these, leading to outages or overage charges.

SEO and Content Quality

  • Duplicate content: Search result pages typically contain snippets of content already available on product or article pages. Indexing them can dilute SEO signals and clutter search-engine indexes.
  • Low-quality pages: Many search result pages aren’t meaningful landing pages. Blocking them helps keep only valuable, well-curated URLs in public indexes.

Privacy, Security, and Compliance

  • Sensitive queries: Site searches sometimes reveal patterns of user interest or internal keywords that shouldn’t be exposed to third parties.
  • Data ownership: Some owners prefer not to let AI training sets include their internal search results or user-generated queries.

Reduce Scraping, Competitor Insight

  • Competitive intelligence: Search URLs can expose product availability, pricing, or trends that competitors might mine.
  • Content theft: Blocking bots makes it harder for automated tools to copy a site’s full content catalog through search listings.

How It’s Done

Start by knowing good bots from questionable and/or bad bots. Then make a plan. For starters, you might add to the robots.txt the following to manage GPTBot by specifying the user agent, or all bots using the *:

# Don't allow GPTBot to your search URLs.
User-agent: GPTBot
Disallow: /search

# Don't allow any bots to access your search URLs.
User-agent: *
Disallow: /search

 

This approach is limited to good bots that will respect robots.txt. Your team may augment your strategy and use WAF/bot-management tools (Acquia Edge, Cloudflare, or Akamai) for a more robust approach to manage, block or rate-limit AI bots.

Bottom line: Search URLs are often dynamically generated, resource-intensive, and low value for indexing, so many site owners proactively block AI and other non-human crawlers from them. Then focus on optimizing your landing pages for the highest quality content for your organization.

Keep Reading

View More Resources