Home / Comment permalink

Drupal's Search Framework: The execution of a search

Drupal's ambitious search module provides a framework for building searches of all kinds. By isolating the tasks involved in searching, and allowing the actual search implementations to be handled by other modules, the search framework sets the stage for all sorts of creative search applications. This article, which applies to Drupal 6, explores the structure of the search framework by following the steps needed to execute a search.

## Stucture of a search

Here are the basic steps involved in searching:

1. Build a search index.
2. Build a search form.
3. Accept a POST request from the form.
4. Redirect POST to GET with search query values expressed in the URL.
5. Parse search query values.
6. Construct search based on query values.
7. Return formatted results.


### Build a search index

The search module's API for indexing HTML content is very simple.

<?php
search_index
($sid, $type, $text);
?>

Example 1: search\_index is the way you put stuff into the search index.

$sid is the unique id for a piece of content, $type corresponds to the name of the search implementation (see 'name' $op for hook_search), and $text is the HTML that is to be indexed.


### Build a search form

The basic search form is simply a text field with a submit button. This form is available by default to every module that implements search using hook\_search. The main method for extending this form is through hook\_form\_alter (see node.module, node\_form\_alter). It is also possible to build search functionality using other tools that don't rely totally on the search framework. See views\_fastsearch for one such hybrid approach.


### Accept a POST request from the form

The search_menu and the search\_view functions cooperate to make sure that any incoming POST requests for search get redirected to the same path using GET, only with the POST information expressed as part of the GET request.

<?php
// In search.module
function search_menu() {

  [...
snip...]

  foreach (
module_implements('search') as $name) {
   
$items['search/'. $name .'/%menu_tail'] = array(
     
'page callback' => 'search_view',
     
'page arguments' => array($name),
     
'type' => MENU_LOCAL_TASK,
     
'parent' => 'search',
    );
  }
  return
$items;
}

// In search.pages.inc

/**
* Menu callback; presents the search form and/or search results.
*/
function search_view($type = 'node') {
 
// Search form submits with POST but redirects to GET. This way we can keep
  // the search query URL clean as a whistle:
  // search/type/keyword+keyword
 
if (!isset($_POST['form_id'])) {
    if (
$type == '') {
     
// Note: search/node can not be a default tab because it would take on the
      // path of its parent (search). It would prevent remembering keywords when
      // switching tabs. This is why we drupal_goto to it from the parent instead.
     
drupal_goto('search/node');
    }
                                              
   
// [...snip...]

    // Do the search and build the form, expressed as $output
                                              
   
return $output;
  }

  return
drupal_get_form('search_form', NULL, empty($keys) ? '' : $keys, $type);
}
?>

Example 2: search\_menu and search\_view.

In search\_menu it can be seen how a path is being built for every search module that implements a search. You can see this in action on any Drupal installation with search.module enabled at the path http://example.com/search[/codefilter_code]. The Content and the Users tabs come from the node and user modules' search implementations, respectively. An interesting and important detail is the path description: $items['search/'. $name .'/%menu_tail']. The %menu\_tail bit passes everything it matches as a parameter, verbatim, without splitting it into further segments. This is important if you want to be able to search for the string "foo/bar", for example. Normally that would be split into path segments based on the forward slash, but %menu\_tail prevents the splitting.


Basic search form
Figure 1: A basic search form with Content and User tabs.


### Redirect POST to GET with search query values expressed in the URL

The first lines of search\_view promise to redirect to GET, but the mechanism for doing this isn't visible in the code:

<?php
// Search form submits with POST but redirects to GET. This way we can keep
// the search query URL clean as a whistle:
// search/type/keyword+keyword
if (!isset($_POST['form_id'])) {
?>

Example 2: Detail of search\_view. Only do something on GET.

Clearly the search itself doesn't happen unless there is no POST form\_id value (ie it is a GET request), but how does the redirect happen? The answer lies deep within the handling of the search form:

<?php

// From search.module
/**
* As the search form collates keys from other modules hooked in via
* hook_form_alter, the validation takes place in _submit.
* search_form_validate() is used solely to set the 'processed_keys' form
* value for the basic search form.
*/
function search_form_validate($form, &$form_state) {
 
form_set_value($form['basic']['inline']['processed_keys'], trim($form_state['values']['keys']), $form_state);
}

/**
* Process a search form submission.
*/
function search_form_submit($form, &$form_state) {
 
$keys = $form_state['values']['processed_keys'];
  if (
$keys == '') {
   
form_set_error('keys', t('Please enter some keywords.'));
   
// Fall through to the drupal_goto() call.
 
}

 
$type = $form_state['values']['module'] ? $form_state['values']['module'] : 'node';
 
$form_state['redirect'] = 'search/'. $type .'/'. $keys;
  return;
}
?>

Example 3: Search form validation and submission.

When the search form is submitted, the values first go to search\_form\_validate(). The sole purpose of the validation is to make sure the processed search keys (the values of the form submission), are passed on to the submit handler. The submit handler, search\_form\_submit(), does the unusual task of validating the form (checking if there are actually keys, or if an empty form was submitted). It can be debated whether that validation actually belongs in the search\_form\_validate function. More interesting to us, however, is the setting of $form\_state['redirect']. This is how POSTed search forms get redirected via GET with the search query in the URL. The Forms API will do the redirect after the submit handler has finished.

This process is one of the first mysteries of the search module that often confuses people when they attempt to understand its inner workings. Despite being somewhat mystical in its behavior, the POST -> GET redirect has a very practical advantage: search result pages can be bookmarked.


### Parse search query values

The one thing that virtually every function needs in the process of doing a search is the $keys variable that contains the search query. In Drupal 6, the entire search query is represented as a string. The function search\_get\_keys() can be used to fetch this string, and it is a simple function that looks first to the path, and then to the submitted form values in order to find a keyword query. Whatever is found is stored statically in the function and cannot be changed during the lifetime of the request.

Management of this keyword query string is an interesting issue, especially in the context of the advanced search form. The search module offers two functions, search\_query\_insert($keys, $option, $value = '') and search\_query\_extract($keys, $option), which aid in the manipulation of the query string. If you call search\_query\_extract("foo nid:4711", "nid"), you get the value 4711 in return. If you call search\_query\_insert("bar", "uid", 42), you get "bar uid:42" in return. Neither of these functions actually interact with search\_get\_keys, however, so they cannot be used to fetch or manipulate the statically cached keys. See node\_form\_alter and the $op = 'search' part of node\_search for usage examples of these functions. Note in particular how the form is always used as the storage mechanism for the search query string.


### Construct search based on query values

The search framework expects modules to use the parsed search query string to do a search for values and return a structured array of results. This process gets triggered in search_view, which calls search\_data, which is a wrapper first and foremost for this code: $results = module\_invoke($type, 'search', 'search', $keys); In other words, the $op = 'search' phase of hook\_search is initiated. The other responsibility of search\_data is to theme the results page, either by invoking the hook\_search\_page implementation for the module doing the search, or by defaulting to theme('search\_results').

Despite the fact that the search framework expects modules to do their own searches, it also provides a mechanism for searching the search index (see step #1). The function do\_search, in its simplest form, is a breeze to use. Take a list of keywords and specify a type ('node' for searching node content), and get search results like this:

<?php
$results
= do\_search('foo bar baz', 'node');
?>

Example 4: Using do_search to find content.

The $results will be an array of top ranking node ids for the keywords "foo" or "bar" or "baz". As this function is one of the core API functions of the search module, you can feel free to call it for your own purposes any time you want. For example, call it from within a block, taking keywords from taxonomy terms or a user's profile interests, and use the returned results as a form of content recommendation.

The full function signature for do\_search, however, is quite intimidating:

<?php
do_search
($keywords, $type, $join1 = '', $where1 = '1', $arguments1 = array(),
 
$columns2 = 'i.relevance AS score', $join2 = '', $arguments2 = array(),
 
$sort_parameters = 'ORDER BY score DESC') {
?>

Example 5: do\_search, Search's API function for finding content.

Discussing all the possible values for the parameters is outside the scope of this article, but the plethora of options are there so that calling code can interact with two distinct queries by injecting JOIN and WHERE clauses into each of them. Sorts can be specified as well, although I don't recall ever seeing this feature utilized.


### Return formatted results

If you want to utilize the search module's standard formatting for search results, your hook\_search('search') has to build a structured array of results where each result follows the format:

#### Required keys:
- link: The URL of the item.
- type: The translated type, et. "Blog entry".

#### Optional keys:
- title: The title of the result.
- user: The themed username of the user who created the search result (ie. node author).
- date: The timestamp associated with the search result.
- snippet: An excerpt of text that gives the context of the keywords that were found in the search result. The search module provides a function, search_excerpt(), which can be used to highlight the keywords within this snippet, but you must call it yourself while building the search result.

## Conclusion

There are potentially many steps that go into doing a search and displaying the results. The search module provides a framework for managing all of these steps, and an API for accessing the various bits and pieces even outside of the context of a traditional search page. The functions search\_excerpt, search\_index and do\_search, in particular, can be called by modules outside of the traditional hook\_search context.

Comments

Posted on by Mark (not verified).

Excellent article. One aspect of the search API that is a bit limiting is that it creates a separate tab in the search results for other modules' invocation of hook_search. It would be nice if module developers could override that behavior and integrate the results from their module's search into the same tab as the results from search.module.

The reason I am pointing this out is that I am the maintainer of the search_attachments module, and the most requested feature from its users is the ability to put the hits on regular node content (i.e., those found by search.module) and hits on files (i.e., those found by search_attachments) in the same tab. In responding to a user request at http://drupal.org/node/242748 I've started to think about ways of doing this but haven't dug too far in so for. Any suggestions?

Posted on by Robert Douglass.

Mark, the first suggestion is to open up a feature request against Drupal 7 and add it to the list of issues here: http://groups.drupal.org/no de/10569

There is a lot of activity going on with search at the moment, and every bit of help counts. Thanks for your awesome contribution (the search_attachments mod). Look forward to discussing "Unified search across implementations" with you in the search group.

Posted on by Mark (not verified).

Thanks a lot, will do.

Posted on by Bit Santos (not verified).

I'm trying to hack in search-by-date fields (published before and published after fields) into the advanced search of nodes on my D6 site but I'm running into trouble. I've found how to add the necessary fields in node_form_alter() and the code to add these parameters to the search query in node_search() but then the submitted data from those fields don't make it to the generated $keys in search_form_submit(). I've been trying for the past few hours to figure out what's going on when search_form_validate(), form_set_value(), and finally _form_set_value() are called, but at this point in the day I'm getting totally lost.

I've pretty much confirmed that search_form_validate() is the point at which it breaks. If I manually type the URL with the appropriate GET query, the search works just fine.

Can I get a little help? :-)

Posted on by Robert Douglass.

You're on the right track. If you're doing this in a module you need to add a validation function. If you're hacking to make a core patch, you need to update node\ _search_validate. Whichever option you choose, you need to study node\_search\_validate to see how it rebuilds the string with the $keys using search \_query\_insert and then packs that string into the form like this:
<?phpif (!empty($keys)) {    form_set_value($form['basic']['inline']['processed_keys'], trim($keys), $form_state);  }?>

This is awkward and I hope that we will soon come up with a nicer paradigm for building this (and other) advanced search forms.

Posted on by Bit Santos (not verified).

I didn't think of looking at node_search_validate()! Thanks. Now I'm confident that my new search parameters are getting through, but now I'm having a problem with the query results. I'm getting zero results with search parameters that I'm sure should yield at least one result.

I have the following code in node_search():
if ($start = search_query_extract($keys, 'after')) {  $conditions1 .= ' AND nz.created >= %d';  $arguments1[] = intval(date('U', strtotime($start)));}if ($end = search_query_extract($keys, 'before')) {  $conditions1 .= ' AND n.created <= %d';  $arguments1[] = intval(date('U', strtotime($end)));}

Is there anything I missed?

Posted on by Bit Santos (not verified).

Woops, the third line should have n.create. I intentionally made it "nz" so I could see the generated query in the error. :-P

Posted on by Robert Douglass.

I'd have to see the actual query being generated before I could say. Make sure to use the devel module and turn query logging on so that you can see all of the queries getting executed, and analyze the query being built, comparing it to the query you expected.

Posted on by Anonymous (not verified).

Hi, simple question. After the post back, the keys query string is set textually in the "Enter your keywords:" text box. so, if my keys value is "somesearch xvalue:something", then my textbox has this entire string instead of just "somesearch".

how can we ensure the proper value is set at postback?

thanks!

Posted on by Robert Douglass.

This depends on what you mean by "proper value". In the ApacheSolr module I decided that no matter what comes in as the URL or POST, any field queries (like nid:5) would not be displayed in the form. This is because the module relies heavily on faceted searching and if you click 5 facets to drill down, with their somewhat long, non-human friendly names, the form will become overpopulated with all sorts of trash. So in the ApacheSolr module all of this extra information is stored in a special singleton object. Look here at the apachesolr_for m_alter function, and look here at the get_quer y_basic function to see how it is done.

In Drupal's core search, the field values are passed on in the $form. Look at node_form_ alter, node_ search_validate, and node_search (in that order) to see how the values of the field queries are persisted.

Posted on by flexer (not verified).

Hi Robert,

I'm using your great ApacheSolr module in a pre-production site (looking forward for the 1.0 and reading all the opens issues so far, particularly about the DISMAX query).

One thing that puzzles me is how to change the default behaviour of the search form *block*. I'm on D6, and I'd like it to take the search directly to Solr and not to the default Drupal one (which, in production, I'll hide to the users), where I need to click the "Search" tab.

Thank you!

Posted on by Robert Douglass.

Hi Claudio,

look under /admin/settings/apachesolr/settings for the Advanced Settings fieldset. In there you can find "Make Apache Solr Search the default". This should solve your problems.

Robert Douglass
Senior Drupal Advisor, Acquia

Posted on by eddy147 (not verified).

Hi Robert

I'm trying to create a custom search but getting stuck.
What I want is to have a drop-downbox so the user can choose where to search in.
These options can mean 1 or more content types.

So if he chooses options A, then the search will look in node-type P,Q,R.
But he may not give those results, but only the uid's which will be then themed to gather specific data for that user.

To make it a little bit clearer, Suppose I want to look for people, then the search will look the keywords in 2 content profile types (nodes), giving back the user (from $node->uid).

I started with creating a form with a text field and the drop-down box.
Then, in the submit handler, i created the keys and redirected to another pages with those keys as a tail. This page has been defined in the menu hook, just like how search does it.

After that I want to call hook_view to do the actual search by calling node_search, and give back the results.

I really would like to know if I am on the right track.
Is this the way to create a custom search?

Thx for your help.

Here's the code for some clarity:

?php
// $Id$

/*
* @file
* Searches on Project, Person, Portfolio or Group.
*/

/**
* returns an array of menu items
* @return array of menu items
*/
function vm_search_menu() {

  $subjects = _vm_search_get_subjects();
  foreach ($subjects as $name => $description) {
    $items['zoek/'. $name .'/%menu_tail'] = array(
      'page callback' => 'vm_search_view',
      'page arguments' => array($name),
      'type' => MENU_LOCAL_TASK,
    );
  }
  return $items;
}

/**
* create a block to put the form into.
* @param $op
* @param $delta
* @param $edit
* @return mixed
*/
function vm_search_block($op = 'list', $delta = 0, $edit = array()) {
  switch ($op) {
    case 'list':
      $blocks[0]['info'] = t('Algemene zoek');
      return $blocks;
    case 'view':
    if (0 == $delta) {
      $block['subject'] = t('');
      $block['content'] = drupal_get_form('vm_search_general_form');
    }
      return $block;
}
}

/**
  * Define the form.
  */
function vm_search_general_form() {
  $subjects = _vm_search_get_subjects();
  foreach ($subjects as $key => $subject) {
    $options[$key] = $subject['desc'];
  }

$form['subjects'] = array(
'#type' => 'select',
   '#options' => $options,
    '#required' => TRUE,
);
  $form['keys'] = array(
    '#type' => 'textfield',
    '#required' => TRUE,
  );
   $form['submit'] = array(
      '#type' => 'submit',
      '#value' => t('Zoek'),
   );
   return $form;
}

function vm_search_general_form_submit($form, &$form_state) {
  $subjects = _vm_search_get_subjects();
  $keys = $form_state['values']['keys']; //the search keys
  //the content types to search in
  $keys .= ' type:' . implode(',', $subjects[$form_state['values']['subjects']]['types']);

 

  //redirect to the page, where vm_search_view will handle the actual search
  $form_state['redirect'] = 'zoek/'. $form_state['values']['subjects'] .'/'. $keys;
}

/**
* Menu callback; presents the search results.
*/
function vm_search_view($type = 'node') {
  // Search form submits with POST but redirects to GET. This way we can keep
  // the search query URL clean as a whistle:
  // search/type/keyword+keyword
  if (!isset($_POST['form_id'])) {
    if ($type == '') {
      // Note: search/node can not be a default tab because it would take on the
      // path of its parent (search). It would prevent remembering keywords when
      // switching tabs. This is why we drupal_goto to it from the parent instead.
      drupal_goto($front_page);
    }

    $keys = search_get_keys();
    // Only perform search if there is non-whitespace search term:
    $results = '';
    if (trim($keys)) {
      // Log the search keys:
      watchdog('vm_search', '%keys (@type).', array('%keys' => $keys, '@type' => $type));

      // Collect the search results:
      $results = node_search('search', $type);

      if ($results) {
        $results = theme('box', t('Zoek resultaten'), $results);
      }
      else {
        $results = theme('box', t('Je zoek heeft geen resultaten opgeleverd.'));
      }
    }
  }
  return $results;
}

/**
* returns array where to look for
* @return array
*/
function _vm_search_get_subjects() {
  $subjects['opdracht'] =
    array('desc' => t('Zoek opdracht'),
          'types' => array('project')
          );
  $subjects['persoon'] =
array('desc' => t('Zoek persoon'),
          'types' => array('types_specialisatie', 'smaak_en_interesses')
          );
  $subjects['groep'] =
    array('desc' => t('Zoek groep'),
        'types' => array('Villamedia_groep')
        );
  $subjects['portfolio'] =
    array('desc' => t('Zoek portfolio'),
          'types' => array('artikel')
          );
   return $subjects;
}

Posted on by InternetPro (not verified).

How to change the number of search results per page?

I've been looking everywhere, and it seems that the number of search results per page are specifically set at 10 in the do_search() function.

Can I override this value without hacking core? How?

Posted on by Robert Douglass.

It's fairly senselessly hardcoded into the core search module. It is, of course, a big shortcoming. It's one more thing the Apache Solr module and Acquia Search give you control over.

Robert Douglass
Senior Drupal Advisor, Acquia