Finding All the Things at the Engineering Hackathon
by Peter Wolanin
Last week the Acquia Engineering team assembled in the office for a week of brainstorming, coding, and fun. One full day (24 hours) was devoted to a hackathon, and we vetted ideas and jockeyed to recruit teammates right up to the 9 a.m. starting time.
A problem often encountered in larger technical teams centers on the one’s ability to find the relevant documentation. At Acquia, we have a wide range of systems that house documentation targeted at the appropriate developers, technical account managers, support and sales departments. Our systems include technical documentation in the RDoc format, Doxygen, GitHub flavored markdown and team specific blogs or wikis. We wanted a solution that would be able to provide a single interface to search documentation in many different formats that would be useful and effective.
With 24 hours and some lofty goals we started hacking to achieve these goals:
- Build a Drupal 7 site
- Integrate with LDAP over SSL so everyone can log in securely
- Serve generated API docs (like RDoc) to logged in users
- Index both the generated doc and docs from our github repos for searching
- Enable an effective faceted search
Halfway thorough we added to the team Richard Burford (psynaptic) who helped us polish the look of the site with a custom theme. Kevin and Richard also contributed to this post.
By utilizing a host of Drupal contributed modules and hosted Apache Solr via Acquia Search we got a big head start. Here’s an overview of everything we have in our contrib folder:
These contributed modules were complemented by a custom module that is now at http://drupal.org/sandbox/pwolanin/1801674. We’ve modified it slightly since the hackathon as even the most inspired code needs improvement once the coffee wears off. Oh, and yes, that is a ruby file in the module. We needed to use the RedCarpet and Github Markup gems to render markdown for indexing in Solr.
The broad outline of what we did to accomplish those goals:
- Create a Jenkins job to checkout a series of git repos, copy docs and build RDoc and commit to directories named per repo in the “allthethings” repo. How you implement this step may vary but we used this bash script (now enhanced and added to the module sandbox) to make Jenkins rock.
- Build the Drupal codebase using drush make and copy into the docroot of the “allthethings” repo.
- Enable, configure, debug, curse at, and finally get LDAP integration working.
- Write the custom Drupal module, apidocs_search, with lot of creative adaptation of existing Drupal core and contrib code. This was the biggest part of the work, and includes:
- A simple private files mechanism to serve generated API docs from outside the docroot to logged in users.
- Custom stream wrappers to simplify access to files that were generated or copied from our git repos.
- Scanning the generated and git files and to look for additions, changes, or deletions.
- A custom indexing loop to analyze html and send the content to Solr.
- Shelling out to a ruby script to render github flavored markdown to HTML.
- Drush commands to test and run things from the command line.
- A custom search facet for ‘api source’.
- Configure blocks and facets and build a custom theme to have a clean search portal.
The theme was based on Stark theme in Drupal core which provided a minimal foundation on which to build. Twitter Bootstrap’s base CSS file was added to give us good default base typography and UI styles to build upon. Then it was just a matter of creating the right layout and styling the UI so it looked polished and well structured. The pagers needed some specific attention to make them work well within the limited space, and the search results needed some adjustments to get them looking right within the context of the search results page so they were easily scannable.
This screenshot show results filtered down to the generated docs for Acquia’s Cloud API, which you can see at https://cloudapi.acquia.com/. The search results link back to the appropriate source in order to provide relevant context for developers that may use the system. This includes the rendered Rdoc or the specific piece of markdown in a repo on github.
The layout of top-level directories in the “allthethings” repo includes these essential ones as described in the module README.txt:
We created generated:// and github:// stream wrappers to allow easy access and customized URLs for files in those two directories outside the docroot. While these wrappers sound complex, it’s a testament to the power of the Drupal 7 file API, that they took just a few lines of code. Take a look in the sandbox at the apidocs_search module to see how it’s done.
Yes, it’s a little bit of a hack to have the source files for indexing committed to the deployed branch of the Drupal site’s repo, but using the Acquia Cloud infrastructure, that was a quick way to get them onto the same webserver.
While not perfectly polished, we hope this module's code can serve as a point of reference for how to index non-Drupal static content into Solr to be searched in conjunction with Drupal content. We also enabled the Apache Solr Multisite Search module, though it’s not being used yet, in anticipation that we will add to this site Drupal content from some of our other sites like https://library.acquia.com/ to broaden the scope of this as a single portal.
In the end, we got a lot of votes from our fellow engineers, and were quite impressed with what we accomplished in a day. We were able to create a production-ready system using Acquia’s services and Drupal that aggregated content from different languages and sources while maintaining appropriate context. We used Solr and Facets to make the content relevant and easy to find while combining PHP, bash scripting and ruby in a coherent solution. Many great ideas were implemented by the Engineering Team during the Hackathon and we look forward to sharing more.