Hosted Solr site search for Drupal is on the way
by Jay Batson
The search technology area is highly important to people with websites. As a result, I've spent serious time looking at it. Several things have come from this time spent:
- The important thing: We'll soon be adding "hosted site search" capabilities to the Acquia Network for our subscribers. More about this below.
- The unimportant thing: Search was modestly influential in the selection of our company name. ah-kwe-eh is the (native American) Navajo WW II code talker word for "Locate." I reasoned that this made sense because most websites are built to help site visitors locate what they're looking for - either information, people, products, or information about people or products, etc. (Oh, and it means we get listed early in alphabetical listings, which really clinched it....)
Why bother with offering hosted site search with Google around? Google solves one aspect of search: making it easy to find a site you want. But it isn't as good at helping visitors find information within a site. Companies like Endeca and Fast have built large, successful businesses supplying technology that provides auto-complete / auto-suggest, content spotlighting, faceted search, and relevance ranking. You can see this kind of stuff in action at places like cnet; when shopping for digital cameras, you can select things like price, manufacturer, and other categories to improve your search results. This is faceted search at work.
A ton of this value can simply be obtained by using knowledge that Drupal knows about its content as facets -- e.g. when a page (node) was created, by whom, what it's taxonomy / folksonomy tags are (and that these are tags), etc. Google can't discover these things during a crawl of a site; but Drupal knows them, and can utilize this metadata in ways Google can't.
Why does this matter? By doing a better job at helping web users discover what they want on a site, better site search encourages visitors to explore and discover new things on a site, connecting users to what they seek, which in turn directly increases page views, time on site, helps with click-through rates and many other metrics that relate to either the usefulness or profitability of the site. It also matters because, frankly, most Drupal sites need better ranked search results than are provided with default search. (Heaven knows ours does....) And finally because big Drupal sites need a faster, more tunable, more scalable search engine than Drupal's built-in site search.
Some forward-thinking Drupal site owners have taken some steps to improve their site search by installing and using another high-quality open source project called Solr, which is a very cool companion project to the mature, fast, and wonderful Lucene search engine. Lucene is a Java application, and Solr is a layer around it implementing faceted search and exposes itself (and Lucene) via a web services interface. (In fact, the cnet site linked above uses Lucene/Solr, and one of the key Solr contributors is the main programmer implementing this at cnet.) Drupal sites utilize a Lucene/Solr server via the Drupal Solr module, maintained by our own Robert Douglass, who has implemented Lucene/Solr in a number of large Drupal sites.
But while many other Drupal sites could benefit from the capabilities offered by Solr, it's often not practical for them to use it. They may lack the Java expertise to deploy and manage a Java-based application - or their hosting environment may not accommodate it. Or they may not know how to attend to the care and feeding of search indexes over the long term. The bar is just high enough that many people either don't even try to add these capabilities or give up after a few troubles.
Acquia can solve that problem. In the same way we can provide spam blocking from the Acquia Network, we can also provide a hosted version of site search. Acquia Drupal sites can connect their sites to it by using the Drupal Solr module. This becomes a staggeringly simple and easy way for Acquia Network subscribers to get the value of this enhanced site search.
So, we will do this. How? Knowing my interest in search, a couple of months ago Robert D. pointed me in the direction of Jacob Singh, Robert's partner in the care and feeding of the Drupal Solr module. Jacob was in the middle of building just such a hosted search service. When he and I mind-melded, it became clear to both of us that we both would win if we pulled his budding service in under Acquia. Some negotiations and weeks later, Jacob has joined us, and we're folding his service into the Acquia Network as I write this.
Various people inside Acquia will be blogging about this over the next few weeks as we get ready to release this service. We want to be very user-driven as we iterate towards releasing this. So if improving time on site, etc., is something you'd like for your site:
- Either register at acquia.com, or update your profile on your existing registration, and tell us you'd like to be considered for pre-release software;
- Keep an eye on this work and participate in the discussions on our Acquia Network Forums (subscription (free) required) and tell us what you'd like from this.
And all this is really just the first step on the road for us to bringing the semantic web and semantic search to the Drupalsphere. While they are different topics and technologies, they're both useful and important. Good site search can be done today; the semantic web and semantic search still need some maturity before we (as a vendor) can act. (Though the Drupal community can, and should be experimenting with this, as Dries suggests.)
If we succeed at this, we can be the leading content management vendor that really understands how search can improve our customers' sites in new and significant ways.