Home / Comment permalink

Acquia Search just got a lot better

My name is Nick Veenhof and I've been a member of the Drupal community for quite some time now. (more than 5 years it seems!). I've been trying to focus on the Search front, more specifically on the Apache Solr Search Integration front. Since September 2011 I have been working on Acquia Search, the Apache Solr Module and all of its derivatives. In this blog I would like to explain what that means from a technical perspective and take you on a ride through time and possibly even the future!

Statistics

let's give you some juicy statistics. I wanted to do this so you could see we really have put in effort to revise this module and making sure it would be dynamic and strong enough for a long lasting contrib future.

Apache Solr Commit Stats

As you can see, since the start of the Drupal 7 branch there has been a huge effort with around 15000 lines removed and more than 22000 lines were added. The complete project is around 17000 lines so that means that literally the whole project was modified and revised since the start of November 2010!
There were 492 commits, averaging on 45 lines added per commit and 30 lines removed per commit. Not bad!

Thanks all that have been so helpful to get to this point!

Note: This is a rough estimate because I'm not sure how precise the add/removal line stats are. I used gitstats to get these numbers and graphs.

Full entity support

There are already contrib modules such as Apache Solr User, Apache Solr Commerce, Apache Solr Term that show how these entities can be indexed and be customized in a way that suits your application. The reason why the main module does not support all entity types out of the box is because of performance reasons and compatibility reasons, I've explained more in my Apache Solr Multisite blog why this was deliberately chosen to not depend on Entity Api and/or Ctools.

See this great blogpost from Phase 2 that explains all about entity support in Apache Solr. Although the table part in this blog post is not necessary (see one of the listed modules above for examples without a custom table), it is a very good example how Apache Solr allows you to customize your website so it fits your technical needs.

Complete Facet Api support

Since the start of the port to Drupal 7, cpliakas put in a lot of effort in Facet Api and a stable module has now been released. Using the latest backport technologies we even made sure that the Drupal 6 version is up to par and equal in functionalities compared to the Drupal 7 version. Why? I suggest you read the multisite blog again ;-)

More debug support!

If you enable devel it will allow you to look through the eye of Solr. It allows you to show what Solr sees during the indexing phase.
Devel Support
Another neat trick is the use of the following snippet. You have to know that Apache Solr Search Integration indexes using the anonymous user. This is to prevent that restricted access content would be indexed. A negative side-effect is that dpm's during the index process won't show, since they only show to the user at that particular moment. You can change this user, but do this with care. Enter the following snippet in your http://mysite.dev/devel/php box or change the variable to your liking. Make sure you revert or unset this variable when you're done debugging!

global $user;
variable_set('apachesolr_index_user', $user->uid);

More food for contrib modules

Modules like the Apachesolr Attachments had some very specific needs. For example, attachments needed a way to index files and relate to other entities. Whenever an entity is deleted in your system, the attached and indexed file (separate document) that links to that entity is also removed. This special behavior had to be generalized in a way that other contrib modules could benefit from this API.

New and updated contrib modules that work for Apache Solr Search Integration and were updated in the last couple of months because of this initiative. They all profit from a renewed API and make sure that whatever you want to do with it, it is customizable using hooks and/or a custom module.

Performance audit and many many improvements

During this period we also analyzed performance and tried to find ways to make Apache Solr quicker whether you are using Acquia Search or hosting Solr locally. I could go on and on about the changes we've made but you can read all about performance at my personal blog. Basically, depending on your server configuration it could be significantly improved! We will be discussing some of the bits and bytes at Drupalcon Munich during our session Bootstrapping Solr Search Clusters and maintain them using puppet.

Aside of that, a graph showing the speed of a search page shows that a regular page load is blazingly fast. I can recommend you to look at the graph if you want to understand how a complete search workflow works.

Detailed documentation

Most of the API functionalities are documented in the apachesolr.api.php file and if you really want to go in detail you can read my Thesis document that was written as part of this upgrade and update proces. This document talks about the UI redesign, allowing multiple solr to connect and much more.

Search API and Apache Solr are friends, not foes

To make this clear, Search API and Apache Solr are friends, not foes. Acquia is currently occupied creating and improving the Solr support for Search API. Even though Search API and the Solr backend are working for a single solr core, it is not completely optimal yet for the Acquia Search service. However, there is already a Acquia Search for Search API connector in dev and during the Drupal Dev Days, Barcelona I gave a talk with Matthias Hutterer. Together we gave a in-depth solr and search presentation with more details about the whereabouts and the "fights" between those two modules.
The conclusion is that we are co-operating, because the modules serve different use cases and Open Source is all about choice ;-)

Conclusion

We hope you can appreciate all of the efforts and goals that Acquia is pursuing to make Acquia Search faster, more stable and more importantly perfect for your website where search is a critical part of the business goal.

Let's continue this effort together, we really could still use your help in making these modules even faster and more stable!

Comments

Posted on by Thomas Bonte.

Hey Nick, great write up! I missed an important contrib module in the list: http://drupal.org/pr oject/apachesolr_views

Posted on by Nick Veenhof.

Naturally apachesolr_views had a big push also, however I am unaware of its current state, but please respond here if you know more :-)

Posted on by Dave Reid.

Could you explain more about your comment that the module isn't able to index all possible entity types because of performance? Reading the multisite blog only had reasons due to the dependencies.

It seems like for the sake of keeping the D6 and D7 branches in sync that we're not able to fully take advantage of new APIs in Drupal 7 with more entity types. Your comment about dependencies is mildly ironic considering to support new entity types we need to add more and more contrib modules. I guess my question is will there ever be a "reboot" of the module for Drupal 7 and beyond?

Posted on by Nick Veenhof.

The big problem here is that entities are all very distinct from each other and can serve very different purposes. Looking at the file entity, the support for the file entity is very different and needs a contrib module to support this case. The same is valid for nodes, comments that are attached to these nodes and users.

Also, the core entity api wasn't flexible enough yet (no criticism) to support this automatic any-entity indexing case. The dependency story is only 1 factor in the decision process. As a sitebuilder, you agree to depend on the modules you choose and also which modules you install to fit your use case.

It is still based upon assumptions and I agree this might not fit every use case.
This will probably change in Drupal 8 where entities are really first class citizens.

Until that time it would be a good idea to start 1 contrib module for apachesolr that depends on entity_api and allows a generic indexing method for arbitrary entities and fields. This way the best of both world would be combined.

Does this answer your question?

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Filtered HTML

  • Use [acphone_sales], [acphone_sales_text], [acphone_support], [acphone_international], [acphone_devcloud], [acphone_extra1] and [acphone_extra2] as placeholders for Acquia phone numbers. Add class "acquia-phones-link" to wrapper element to make number a link.
  • To post pieces of code, surround them with <code>...</code> tags. For PHP code, you can use <?php ... ?>, which will also colour it based on syntax.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <h4> <h5> <h2> <img>
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.