Home / Using apachesolr to index custom data

Using apachesolr to index custom data

Imagine you have a custom database table, storing some imported content for your nodes. This post will explain how this content can be exposed to be indexed by Solr via the apachesolr module. The example module provided was tested with apachesolr 6.x-1.6. It assumes that this custom content is associated to nodes (e.g. a custom imported text value).

Hooks on the table

  • hook_apachesolr_modify_query - to tell solr we are passing a new field of a given type.
  • hook_apachesolr_update_index - to pass our custom data alongside (not appended to) the node object
  • hook_apachesolr_process_results - show the custom data field in the result list

Exposing the data

First of all we need to let solr know about our data, on how to hash it, store it and do whatever black magic it is doing when people are not watching. hook_apachesolr_update_index will be triggered when the apachesolr module deems its time to update the search index. We have a chance to step in and change the document sent to solr, adding our own datafield.

As an example, the following example loads some summary text from the database when story nodes are being indexed and tells Solr to store it:

<?php
function HOOK_apachesolr_update_index(&$document, $node, $namespace) {
  if (
$node->type == 'story' && $document->entity == 'node') {
   
$random_string = db_result(db_query('SELECT summary FROM {story_summaries} WHERE nid = %d', $node->nid));
   
$document->ts_summary = $random_string;
  }
}
?>

But how does Solr know what the data is? Is it an integer? Or a double precision number? Perhaps a nice trout? The schema.xml supplied by the apachesolr module contains these field definitions, so lets take a look. Around line 377 the dynamic field definitions begin, we can find lines like the following:

<dynamicField name="ts_*"  type="text" indexed="true"  stored="true" multiValued="false" termVectors="true"/>

This tells Solr that if we pass a field prefixed by ts_, it should then be stored as a single value textfield. This is perfect for our use-case, hence $document->ts_summary is given the value from the database.

You may have noticed that instead of relying on the $node object we pulled our information directly from the database. This is because of the way the module loads up the node information (only the 'view' node operation is available to us), any data loaded in the 'update index' operation will be just appended to the $node body (see lines 63, 69 and 92 of apachesolr.index.inc).

Know where to look

By now we have our data passed to Solr and indexed. We need to make sure that we also search in it, for this we implement hook_apachesolr_modify_query. This hook is triggered whenever a module is performing a search, for instance when the user submits the search form.

$query can be used to change the - gasp - query, while $params is an array of parameters. In this example we simply want Solr to search our new example field, so we add it to the 'fl' element (http://wiki.apache.org/solr/CommonQueryParameters#fl).

<?php
function HOOK_apachesolr_modify_query(&$query, &$params, $caller) {
 
$params['fl'] .= ',ts_summary';
}
?>

Displaying the results

The last piece of the puzzle is to add our field to the search result display:

<?php
function HOOK_apachesolr_process_results(&$results) {
  foreach (
$results as $index => $item) {
    if (
$item['node']->type == 'story' && !empty($item['node']->ts_summary)) {
     
$results[$index]['snippet'] .= t('Summary: @summary’, array(‘@summary’ => $item['node']->ts_summary));
    }
  }
}
?>

This is a simple loop over the results, appending the field with some additional HTML and the field data.

It’s worth to note that before this hook there is another way to alter the document: hook_apachesolr_search_result_alter is exposed via drupal_alter and is given each $document variable.

Tags: 

Reacties

Posted on by Steve N (niet gecontroleerd).

Is there a trick to getting the update_index hook to work? I created a custom module hook but it's not getting called and I can't figure out why...

function mysearch_apachesolr_update_index(&$document, $node) {
if (! $document) {
return;
}

switch ($node->type) {
case 'author':
$document->addField('sm_field_author_first_name', 'xxx');
$document->addField('sm_field_author_last_name', 'yyy');
break;

case 'book':
$document->setMultiValue('sm_field_book_id', 'zzz');
break;
}
}

Any ideas? Thanks!

Steve

Posted on by Steve N (niet gecontroleerd).

Found the solution -- see the release notes @ http:// drupal.org/node/204268/release?api_version[]=103

Posted on by Balazs Dianiska.

Yeah, this example is for Drupal 6 only, and for the 1.6 version. I am preparing an article for the 6.x-3.x and 7.x-3.x series too, this pointer is definitely helpful for me too.

Reactie toevoegen

Plain text

  • Geen HTML toegestaan.
  • Adressen van webpagina's en e-mailadressen worden automatisch naar links omgezet.
  • Regels en alinea's worden automatisch gesplitst.

Filtered HTML

  • Use [acphone_sales], [acphone_sales_text], [acphone_support], [acphone_international], [acphone_devcloud], [acphone_extra1] and [acphone_extra2] as placeholders for Acquia phone numbers. Add class "acquia-phones-link" to wrapper element to make number a link.
  • To post pieces of code, surround them with <code>...</code> tags. For PHP code, you can use <?php ... ?>, which will also colour it based on syntax.
  • Adressen van webpagina's en e-mailadressen worden automatisch naar links omgezet.
  • Toegelaten HTML-tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <h4> <h5> <h2> <img>
  • Regels en alinea's worden automatisch gesplitst.
Bij het indienen van dit fomulier gaat u akkoord met het privacybeleid van Mollom.