Home / Hosted Solr site search for Drupal is on the way

Hosted Solr site search for Drupal is on the way

The search technology area is highly important to people with websites. As a result, I've spent serious time looking at it. Several things have come from this time spent:

  • The important thing: We'll soon be adding "hosted site search" capabilities to the Acquia Network for our subscribers. More about this below.
  • The unimportant thing: Search was modestly influential in the selection of our company name. ah-kwe-eh is the (native American) Navajo WW II code talker word for "Locate." I reasoned that this made sense because most websites are built to help site visitors locate what they're looking for - either information, people, products, or information about people or products, etc. (Oh, and it means we get listed early in alphabetical listings, which really clinched it....)

Why bother with offering hosted site search with Google around? Google solves one aspect of search: making it easy to find a site you want. But it isn't as good at helping visitors find information within a site. Companies like Endeca and Fast have built large, successful businesses supplying technology that provides auto-complete / auto-suggest, content spotlighting, faceted search, and relevance ranking. You can see this kind of stuff in action at places like cnet; when shopping for digital cameras, you can select things like price, manufacturer, and other categories to improve your search results. This is faceted search at work.

A ton of this value can simply be obtained by using knowledge that Drupal knows about its content as facets -- e.g. when a page (node) was created, by whom, what it's taxonomy / folksonomy tags are (and that these are tags), etc. Google can't discover these things during a crawl of a site; but Drupal knows them, and can utilize this metadata in ways Google can't.

Why does this matter? By doing a better job at helping web users discover what they want on a site, better site search encourages visitors to explore and discover new things on a site, connecting users to what they seek, which in turn directly increases page views, time on site, helps with click-through rates and many other metrics that relate to either the usefulness or profitability of the site. It also matters because, frankly, most Drupal sites need better ranked search results than are provided with default search. (Heaven knows ours does....) And finally because big Drupal sites need a faster, more tunable, more scalable search engine than Drupal's built-in site search.

Some forward-thinking Drupal site owners have taken some steps to improve their site search by installing and using another high-quality open source project called Solr, which is a very cool companion project to the mature, fast, and wonderful Lucene search engine. Lucene is a Java application, and Solr is a layer around it implementing faceted search and exposes itself (and Lucene) via a web services interface. (In fact, the cnet site linked above uses Lucene/Solr, and one of the key Solr contributors is the main programmer implementing this at cnet.) Drupal sites utilize a Lucene/Solr server via the Drupal Solr module, maintained by our own Robert Douglass, who has implemented Lucene/Solr in a number of large Drupal sites.

But while many other Drupal sites could benefit from the capabilities offered by Solr, it's often not practical for them to use it. They may lack the Java expertise to deploy and manage a Java-based application - or their hosting environment may not accommodate it. Or they may not know how to attend to the care and feeding of search indexes over the long term. The bar is just high enough that many people either don't even try to add these capabilities or give up after a few troubles.

Acquia can solve that problem. In the same way we can provide spam blocking from the Acquia Network, we can also provide a hosted version of site search. Acquia Drupal sites can connect their sites to it by using the Drupal Solr module. This becomes a staggeringly simple and easy way for Acquia Network subscribers to get the value of this enhanced site search.

So, we will do this. How? Knowing my interest in search, a couple of months ago Robert D. pointed me in the direction of Jacob Singh, Robert's partner in the care and feeding of the Drupal Solr module. Jacob was in the middle of building just such a hosted search service. When he and I mind-melded, it became clear to both of us that we both would win if we pulled his budding service in under Acquia. Some negotiations and weeks later, Jacob has joined us, and we're folding his service into the Acquia Network as I write this.

Various people inside Acquia will be blogging about this over the next few weeks as we get ready to release this service. We want to be very user-driven as we iterate towards releasing this. So if improving time on site, etc., is something you'd like for your site:

  • Either register at acquia.com, or update your profile on your existing registration, and tell us you'd like to be considered for pre-release software;
  • Keep an eye on this work and participate in the discussions on our Acquia Network Forums (subscription (free) required) and tell us what you'd like from this.

And all this is really just the first step on the road for us to bringing the semantic web and semantic search to the Drupalsphere. While they are different topics and technologies, they're both useful and important. Good site search can be done today; the semantic web and semantic search still need some maturity before we (as a vendor) can act. (Though the Drupal community can, and should be experimenting with this, as Dries suggests.)

If we succeed at this, we can be the leading content management vendor that really understands how search can improve our customers' sites in new and significant ways.

Comments

Posted on by Robert Douglass.

Yay! I can finally talk about this in public :D

For over a year I've been telling people at conferences how cool Solr is - with faceted search, content recommendation, multisite search, file search and more - and soon, thanks to Acquia, it will be easy for people to set up and experience it for themselves.

Jacob Singh has been a great partner in advancing the ApacheSolr module, and I'm thrilled to have him on board at Acquia.

Awesome!

Robert Douglass
Senior Drupal Advisor, Acquia

Posted on by Ryan (not verified).

Awesome news... we've been using faceted search on the Ubercart community site for some time now thanks to Rob's module, a spare server we had in office, and some help setting it all up from Mike O'Connor. (Hooray for communities!) It's been helping me to find old content tucked away on our site just as much as it's been helping our visitors and reducing the number of duplicate posts in our forums. Simple things impress me, like the magical faceted search blocks that I can enable on the search pages. : )

We've had something like 25,000 searches on it since August and no complaints so far. ; )

Posted on by Hans Henderson (not verified).

Could this please be enabled ASAP for Drupal.org?

Google's all fine and good but. . .

If d.o. were used as a real showcase for Drupal's capabilities and all that Acquia had to offer its top clients I'm sure many people would be impressed.

Not to mention the benefit of making d.o. more usable in trying find the golden needles in that haystack of documentation :)

PS OT I went to Phillips there in Andover, nice town, must be better than working downtown. . .

Posted on by wmostrey (not verified).

I wonder how you feel ApacheSolr compares to the faceted_search module? Especially faceted_search's concept of environments is a huge plus. I worked with both and I found that Faceted Search is a valid alternative to ApacheSolr.

So in short: what are your reasons for selecting a subscription-based implementation of ApacheSolr instead of integrating the faceted_search module in the Acquia Drupal installation?

Posted on by Robert Douglass.

Wim,

David Lesieur's Faceted Search module is excellent, there's no argument there. The great innovations that it offers come especially in the configuration and the user interface. I'm a big fan of both. The downside to his module is that it is MySQL based, and doing facets in MySQL is a real drain on database resources (ie not suitable for huge datasets or high-traffic sites). The REST API aspect of ApacheSolr makes it a great candidate for being offered as a service, so if our goal is to give people great search while not killing their database, it is the perfect technology.

Robert Douglass
Senior Drupal Advisor, Acquia

Posted on by karstonjhon (not verified).

But while many other Drupal sites could benefit from the capabilities offered by Solr, it's often not practical for them to use it. They may lack the Java expertise to deploy and manage a Java-based application - or their hosting environment may not accommodate it. Or they may not know how to attend to the care and feeding of search indexes over the long term. The bar is just high enough that many people either don't even try to add these capabilities or give up after a few troubles.

Posted on by Lev Tsypin.

Any idea of what the costs associated with this service will be when released?

Posted on by Linea Rowe.

The Search service will be in beta (free) for several months. We're still researching the cost to provide the service, but we expect to be able to offer it as part of an Acquia subscription at a certain level. People who have very large sites will be able to purchase higher levels and index more content.

Let us know if you'd like to be a beta tester for the Search service.

Thanks,
Linea

Posted on by daviddoff (not verified).

The REST API aspect of ApacheSolr makes it a great candidate for being offered as a service, so if our goal is to give people great search while not killing their database, it is the perfect technology.

Posted on by DwightAspinwall (not verified).

We are in the process of converting a large non-profit website to Drupal. The site maintains tens of thousands of documents, and improving search is a critical requirement. We are in the process of testing Google Site Search (GSS), and so far the results are encouraging.

GSS provides a means for filtering results, which they call Labels, which are tied to the URL hierarchy. For example, if you put all product-centric pages under the path /products and the library under /library, and give each path a label, Google will include the labels with links to the filtered pages in the result set.

This is good, but limited functionality. What I would like to see from you is a detailed treatise on how Solr's facets go beyond what GSS provides. For us, Google is your main competitor, and I think you need to address the differences directly.

Posted on by Linea Rowe.

Hi Dwight,
As soon as we have a competitive doc that is in shape to be shared, I'll send you a copy. In the meantime, let us know if you want to be a beta tester for Acquia Search and I'll add you to the list. thanks

Linea

Posted on by diablopolis (not verified).

Interesting.. People who have very large sites will be able to purchase higher levels and index more content. Regards