Home / Comment permalink

The Drupal taxonomy sprint with the Encyclodpedia of Life at the Chicago field museum

The history of the taxonomy sprint
Back in March 2008, Scott Mattoon, from the Open Architecture Network and Sun, contacted me regarding a TED prize alumni, the Encyclopedia of Life(EOL) who wanted to attend Drupalcon Boston. I was introduced to David Shorthouse, from EOL, and was eager to help the encyclopedia of life adopt Drupal. The TED prize wish video for the EOL project is below.

David Shorthouse and Peter Mangiafico attended Drupalcon Boston to evaluate the Drupal community and quickly embraced the potential we displayed to meet EOL's needs. A few months later Chuck D'Antonio, head of Acquia's professional services, began talking with David about how Drupal could help meet their needs for integrating with their Ruby on Rails system and some ideas they had for multi-site configuration. Some of the big technical challenges they faced were the scale of taxonomies they needed to use, and how the biology field needed richer meta-data for taxonomy terms.

Chuck, Robert, and myself helped put forth some recommendations on how to do a taxonomy sprint including goals and a recommended list of attendees. Around the same time Nathaniel Catchpole and Peter Wolanin were talking about some ideas for improvements in taxonomy module for Drupal 7. Nathaniel started a taxonomy improvements for Drupal 7 thread in the taxonomy group.

In July the Encyclopedia of Life team put up a http://sprint.eol.org website and began recruiting attendees from the Drupal community to attend. They got a great list of contributors from the community including:

  • Matthias Hutterer Taxonomy manager from 2007 Google Summer of code project, term relation types, CCK taxonomy
  • Ben Melancon, Community Managed Taxonomy, Node Relativity Access Control, Edit term, Term message
  • Dan Morrison - Relationship, Taxonomy import/export via XML, Edit term , Term message
  • Simon Rycroft - Tinytax taxonomy block , Leftandright - Nested Set Taxonomy , Taxonomy Autotagger, Big Autocomplete TAXonomy
  • Chach Sikes
  • Vince Smith leads the Scratchpad project at the Natural History Museum London, which makes heavy use of the Taxonomy module. He made a series of initial recommendations for the taxonomy Drupal sprint and provided the some of the biological use cases during the event. You can find more info in his initial presentation given during the Sprint.
  • A semantic future
    Back in March, Dries revealed a vision of a semantic future for the web and hinted that Drupal could support it. In the biology domain, a new semantic standard called the Taxon Concept Schema (TCS) has been developed. It is a key standard for the Encyclopedia of Life project and it is critical that Drupal be able to support this kind of semantic data for relationships between terms. A good example of capturing this term relationship information had been implemented by Dan Morrison in his taxonomy import/export via XML project.

    Progress at the sprint
    I spoke with David Shorthouse, Nathaniel Catchpole, and Benjamin Melancon this week at the sprint. Nathaniel indicated they have been making good progress on a re-write of the taxonomy module for Drupal 7 to support term loading with relationship and vocabulary meta data. They hope to add some new load and save hooks to support this semantic metadata. As they have been re-writing the module they have discovered several core bugs that are now in the issue queue, and they have done some productive clean-up of the existing module. They hope to add several new features and include some performance improvements they have discovered as well. The sprinters have also been working to get this new taxonomy module to make use of the new schema api and the next generation database layer in Drupal 7. This has resulted in several new tests for taxonomy module in Drupal 7 which ensure that a re-written taxonomy module will meet previous functionality.

    Left to Right: Simon Rycroft, Nathaniel Catchpole, Anthony Goddard, Lisa Walley, Roger Espinosa, Matthias Hutterer,Cyndy Parr, Dan Morrison, Chacha Sikes, David Shorthouse, Benjamin Doherty, Vitthal Kudal, Alexey Shipunov, Ben Melancon

    How does that help now?
    Taxonomy module is one of the most extended modules through Drupal contributions. The sprinters think that 90% of the new work they are doing for Drupal 7 can be back ported to Drupal 6 and it will help with the consolidation of contributed taxonomy modules. For those looking to get more metadata around your terms the sprinters are hoping that this will be at least partially available through a Drupal 6 version of the taxonomy enhancer module.

    If you want to follow improvements to taxonomy core more closely, check out this list of taxonomy core issues being addressed at the sprint. If you are interested in helping with taxonomy improvements Benjamin Melancon and I will be Drupalcamp NYC this Sunday, and we would be happy to work with you to get these improvements done.

    If you have an idea for a Drupal sprint, and think there's a good diverse group of contributors who would also be interested please get in touch, and we will try to connect you with some sprint sponsors.

    Comments

    Posted on by David Shorthouse (not verified).

    Thanks Kieran for a great synopsis of the history leading up to the sprint and the sprint itself. I'm extremely happy to see that work continues with the taxonomy module. In fact, there are now more members of the Drupal Taxonomy group than there was prior to the sprint. I neglected to add Vince Smith to the list of participants. Vince was a key player in getting us thinking about the possibility of using synonyms as full terms and then constructing a flagging mechanism (i.e. metadata) to differentiate terms. What we desperately have to do now is examine the implications this has for performance. So, Simon Rycroft has been working on both a materialized path and a nested set algorithm to get us beyond using a performance draining & simple parent-child table....works for small vocabularies, but falls on its knees with large vocabularies. It is not uncommon for biologists to deal with several hundred thousand terms in the course of their own projects.

    Posted on by kaw3939 (not verified).

    Good deal, this is looking promising for my work on adapting wordnet to drupal. I need a way of extending the taxonomy module to provide much richer descriptions for each word. I was thinking about a way to extend the taxonomy module or use a combination of nodes and taxonomy terms to provide extended information. Something like attaching a NID for a word into the term description or something.

    Posted on by Kiory (not verified).

    Nice article and very good photo! Thank for sharing.