Advanced Apache Solr Example: IP-based Access

In the run-up to our talk "Apache Solr Search Mastery" at Drupalcon San Francisco, we decided that we would not have time to really cover all the advanced topics in the session. So we're going to put up a couple blog posts before hand to invite some discussion and encourage people to dig into the code ahead of time and then we can take questions at the end of the session or during a BoF.

This first post describes the elements of a module that implements a customized IP-address-based scheme for access control on Solr searches. It's a simplified version of the sort of access controls that some universities or companies use to only show (for example) journal articles purchased under license via a website for the library where the license restricts access to students or employees who are on-site. The attached module demonstrates how such a scheme for controlling which nodes appear in search results can be implemented. The code there should be contrasted with the code in the apachesolr_nodeaccess module.

This module uses a netmask to determine if the current request comes from an "off campus" or "on campus" IP address. I have enjoyed/suffered this sort of "campus" access rules myself, and this application to Solr came out of my discussions at a local Drupal meet-up about the content access controls in place at the IAS for their sites (including many Drupal sites) and potential migration to Solr for multi-site search.

On a test system (this is not intended to be production code) where you already have the Apache Solr Search Integration module installed and working with Solr, unpack the attached tarball in your sites/all/modules directory, install the Apache Solr IP Access module, and configure at ?q=admin/settings/apachesolr_ipaccess. On this form you can choose certain content types to be public, as well as defining the netmask. All content of types not marked as public will only be accessible to users who are "on campus" (for example on your home LAN). You will then need to re-index your site content. Otherwise, the access controls will prevent any content from appearing in searches for "off campus" users. For completeness, I have also included alterations to the menu access callbacks and an implementation of hook_db_rewrite_sql().

The key interactions with the Apache Solr Search Integration modules take place in these two little hook implementations:

<?php
/**
* Implementation of hook_apachesolr_update_index
*/
function apachesolr_ipaccess_apachesolr_update_index(&$document, $node, $namespace) {
 
$public_content_types = variable_get('apachesolr_ipaccess_public_types', array());

  if (!empty(
$public_content_types[$node->type])) {
   
// Add a boolean flag to the document.
   
$document->bs_ipaccess_public = TRUE;
  }
}

/**
* Implementation of hook_apachesolr_modify_query().
*/
function apachesolr_ipaccess_apachesolr_modify_query(&$query, &$params) {
 
// Any off-campus users (except authenticated administrators), only see
  // public content.
 
if (apachesolr_ipaccess_off_campus()) {
   
$query->add_filter('bs_ipaccess_public', '1');
  }
}
?>

 
The logic embedded in these function is pretty simple. Step 1: when a document (built from the data in a node) is about to be indexed, add a boolean flag if it's accessible to the public. Here we use a dynamic field, bs_* in the schema.xml the comes with the Apache Solr Search Integration module. Data can be stored in a new field simply by matching the glob pattern of a dynamic field in any document when it's indexed. Step 2: when a search is executed, if necessary, add an additional filter restricting the result to content flagged as public.

The determination of whether the current user is on or off campus is made with this pair of functions, which leaves a back door for authenticated admins (since I'm sure you don't want to have to go "on campus" to make sure the site is functioning):

<?php
/**
* Returns true if the current user is off campus, unless they are an admin.
*/
function apachesolr_ipaccess_off_campus($account = NULL) {
 
// Any off-campus users (except authenticated administrators), only see
  // public content.
 
return !user_access('administer nodes') && apachesolr_ipaccess_ip_is_remote();
}

/**
* Returns true if the current user's IP address is a remote address.
*/
function apachesolr_ipaccess_ip_is_remote() {
 
$remote_addr = ip2long(ip_address());
 
$server_addr = ip2long($_SERVER['SERVER_ADDR']);
 
$netmask = ip2long(variable_get('apachesolr_ipaccess_netmask', '255.255.255.0'));
 
// An empty $netmask indicates a problem, so we treat all as remote.
 
return (!$netmask ||
         
long2ip($remote_addr & $netmask) != long2ip($server_addr & $netmask));
}
?>

 
Questions? Come to "Apache Solr Search Mastery" at Drupalcon and ask during the Q&A or leave a comment. Note that ip_address() is a Drupal API function that allows for the presence of a reverse proxy in the request path. For more about defining a Solr schema, see the Solr wiki.

Comments

Posted on by Bram_8 (not verified).

Thanks , Peter for sharing insights on IP based access .The key interactions that u have highlighted in Solr search integration modules was also quite helpful. Recently , I came across an interesting article (http://www.lucidimagination.com/search/ document/86bcf1ecf5a3855b/redirec...) describing IP based access with a slightly different approach.