Creating Solid Search Experiences with Drupal [November 13, 2012]
Want to learn more about Acquia’s products, services, and happenings in the Drupal Community? Visit our site: http://bit.ly/yLaHO5.
While eCommerce sites have become search conversion centers of excellence, non-transactional sites have struggled to create superior search experiences.
For Drupalists, understanding how to navigate the landscape of search modules and what fits your site best can be daunting. Once you've selected a module, choosing which functionality to implement and how adds further complexity. This webinar will focus on strategies and best practices on building a superior search experience and walk though the modules and configurations that are required to achieve it.
Although the tools and techniques can be leveraged by any Drupal site that is integrated with the Apache Solr search engine, the demonstrations in this webinar will be conducted on Acquia Search. Acquia Search allows you to leverage the full power of Solr without any expertise required.
A few highlights for best practice implementation include:
• Faceted navigation
• Implementing rich search result snippets
• Relevancy tuning
Speaker 1: Hi everyone. Thanks for joining the webinar today. Today’s webinar is Creating Solid Search Experiences with Drupal, with Chris Pliakas, who is the product owner of Acquia Search.
Chris Pliakas: Today’s webinar on creating a search experience with Drupal, I think we’ve done a lot of webinars in the past where we focused on Acquia Search, we focused on some of the basics. Today, I really wanted to focus on just Drupal in general, not having Acquia Search focus. Of course, all these techniques can be used with Acquia Search, but I just wanted to highlight some of the things that are in the community.
Also, based on our experiences hosting over 1,600 indexes, with 1,600 subscriptions, people who are experimenting with search pages and various UX talk about some of the trends that we’re seeing, and in order to get the best experience possible, we wanted to touch on some content strategy items that you can employ to make sure that your search is set up for success.
Then we’ll focus on the search page user interface, so we’ll do a live demo exploring some of the tools that are available to Drupal right now that can be used to create modern-day search user interfaces so that your users get the best experience possible out of the application and could find content that they’re looking for.
Also, we’re going to demo some things that are coming down the pike. I think it’s important to recognize that right now, enterprise search is at a crossroads, and I just want to distinguish for a minute what enterprise search means. When we talk about enterprise search, we’re talking about internal site search, and enterprise doesn’t necessarily mean large corporations. Enterprise simply means that that search is important to your business and important to you, so this isn’t just a big business thing. This is for searches of any size.
But we see some trends that are emerging with external searches, searches like Google, Bing, Yahoo!, that are now going to be expected by users of your internal site search. Trends that are emerging in the search community at large, really, there is going to be an expectation that your search experience matches what’s out there currently. It’s pretty advanced things, so we’ll talk about those trends and we’ll talk about what’s changing in this space specifically.
We talked about search is really evolving. Over the past 10 years or so, which is quite a long period of time, your internal site search really hasn’t been much more than the user entering keywords and then displaying results that are pretty basic. You have a title, you have a snippet, that sort of thing. But really, right now, search is starting to move into a different space where we have to identify what the user is actually looking for and then display relevant results. Relevant results don’t just mean keyword matching, meaning knowing things about your content, knowing things about your user to make some assumptions to present them with relevant results.
As we create more and more content on the Web today, it’s getting harder and harder to sift through that data and display meaningful data. One thing that we’ll start out with is just a simple example.
What I want to start out with is talking about Apple. How many people know Apple? All right, so I see some hands in the webinar. I guess I want to ask “How well do you know Apple?,” so a first question that I want to ask you is, “Is Apple growing?” I’ll let you answer in your heads. It’s not really a good forum for answering in public.
The second question that you should think about is does Apple have money? Then the third question, is Apple multilingual? Does Apple support multiple, does Apple have knowledge of multiple languages? Does Apple speak more than just English? Those are the three questions that I want you to answer in your head.
I’m just going to assume that you guys did a good job and you were able to answer that. Based on those three questions, I think there is no doubt that we’re talking about Apple Martin, who is the daughter of Gwyneth Paltrow and Coldplay lead singer Chris Martin. Apple Martin, like all kids, she is growing. Does she have money? Absolutely. I think her parents are doing pretty well; one is a rockstar; one is an actress. One useful tidbit is that she cannot watch TV in English, so she is getting raised as a multilingual speaker.
Is that the wrong Apple that you were thinking of? I’m assuming that it is. Tech audiences, let me say, Apple, usually think of the company Apple, and the problem really is about context. The first trend that I want to talk about is contextual computing. Right now, we start to see how Apple could mean different things. It could mean the fruit. It could mean the company. It could mean Apple Martin. It could mean Fiona Apple. It could mean a lot of different things.
When I talk about context, I mean the things surrounding it that expose the content for what it is. For example, if we are talking about Apple being a Fortune 5 company or a Fortune 1 company, whatever it is right now, then that context would expose Apple as being a company. If we were on a pop culture website, then it would be more likely that Apple is the daughter of Gwyneth Paltrow, like we mentioned.
Context and how it relates to your content is getting to be really important as we get more and more data. Sites aren’t just displaying one thing now. Sites are starting to display lots of different pieces of content, and we need to start recognizing that simple keyword searches aren’t going to serve our users. We really want our results to be relevant towards what people are actually looking for.
One way that we can do this is by search statistics. Search is a really unique tool in that it is a window to what your users are expecting on your site. By entering keywords and by clicking on various pieces of content, your users are actually telling you what they want from your website, and they are telling you what content they think is relevant.
There are things out there like voting or reputation metrics, but search is really the best tool to be able to extrapolate what people are trying to do with your website.
That also leads into structured data, which is another trend that we’re going to talk about. Structured data is a way to actually denote what type of content you have on your site. Whether, again, we’ll go back to the Apple example, is Apple the organization or Apple that’s something else? These are the three trends that search is really rallying around.
I want to talk about what Drupal is doing right now to address this and some of the things that are going to be coming down the pike within the next six months or so, because it’s important that as you start to build your search experience that you’re starting to recognize some of these trends so that when the Drupal tools emerge, you can make use of them effectively and provide the site search that your users are coming to expect.
Now I’m going to go to the live demo portion of the site just to set the stage here. I have a really basic Drupal install. It’s the standard Drupal blue that you see out of the box, and it has some prepopulated content. It has a couple of events, a couple of blogs. We’ll actually build out some of the search experiences and identify some of the trends that we talked about.
Now that we have the site up, right now, I’m connected to an Apache Solr backend. Again, if you’re connected to Acquia Search or you are connected to Apache Solr, I think there are demos soon that you can install Drupal. You can configure some of the basic modules. You can download, install the modules. We’re going to start with that assumption that that’s the level that we’re at.
If you do need some help or if you are unsure as to how to install modules, how to configure modules, I do recommend that after this webinar, there are some great resources on drupal.org and some great articles that Acquia provides as part of its forums, part of its library that can help ease that transition. But you can still get some value out of this webinar by following along and taking notes of which modules are being used and seeing how you can configure them once they’re installed.
First, what I want to do is I want to just execute a search. It’s the same whether you’re using core search or any other backend. But I’m going to search for DrupalCon, and we’ll start to analyze some of the results to see what the default behavior is that you get out of the box.
The default behavior we’ll see is somewhat useful but not really. But if I entered DrupalCon, it will give me the pieces of content that match that keyword. It will give me a highlighted results snippet, and it will show me a little bit of information in terms of who the user was that posted that content and what date that content was posted. Sometimes, that’s useful. Sometimes, that’s not. But again, this is a basic search interface that you get out of the box.
To be perfectly honest, this isn’t very useful. This isn’t what users expect. If you compare it to Google or Yahoo! or Bing or all the other major players out there, this is weak, and it doesn’t really give users the information that they need to effectively search the content of your site.
The first thing that I want to do is I want to explore something called Facets. And facets are filters that users can apply to help refine the search results, and it also gives some aggregate information such as the count or number of results matching that filter based on the keyword that you entered.
The first module that I want to explore is something called the Facet API module. I’m going to go to the project page here. This is a module that works with core search. It works with Apache Solr search integration. It works with Search API if you’re using that module. It’s a way to configure your search interface regardless of what search backend that you’re using.
If I expand the screenshot here, you’ll see that here are some examples of the types of facets that you can have. You can have facets by content type, by date. There are even some interesting contributed modules out there that allow you to display facets as graphs. You can really control the interface and display things in pretty interesting ways.
I’m just going to scroll down and show some of the things that you can … some of the add-ons that are available that you can make advantage of. Again, we have the graphs that we talked about. We have a slider, so if you have numeric facets, numeric content, you can say, “I want to show data between this range,” tag clouds, and also date facets, which we’ll actually explore and configure.
I’m not going to spend too much time. That’s just an overview to whet your appetite for what’s out there and what’s available in the Drupal community. But I do want to just go and start configuring this so you can see what this looks like and how this works.
The first thing that I want to do is I want to be able to filter this by the content type. I do have two content types here, blog and event, so I want people to say, “Okay, if I’m searching for DrupalCon, I want to filter by the blogs or I want to filter by the events that I want to see,” so that you can get the relevant information for you.
First thing I’m going to go do is configure the Apache Solr Search Integration Module. That’s the one that I’m using. I’m going to go to Apache Solr, going to go to Settings, and I am going to go to Facets. These are the lists of the facets that I have available to me. First thing I’m going to do is configure and enable the content type. I’m going to save this configuration.
Now that facet is saved, I actually have to position it on the page. The default facets are blocks. Blocks in Drupal are small pieces of content that you can position in various regions or various areas on a page. Once you enable a facet, there is a link up top that allows you to go directly to the block configuration page so that you can configure this immediately.
If I click on Blocks and scroll down … it’s actually enabled for me. I’m just going to reset this so that it is where you guys will see it when you start from scratch. But it will start down here in the disabled category. These are all the blocks that are disabled. We look for Facet API, the backend that we’re using, and then content type. This is the facet that we just enabled.
I’m going to position this in the first sidebar. It is recommended that you do position it in the first sidebar, so that will be on the left-hand side. The reason is because that’s where most of the major search engines position their facets, so in order to help people navigate your search page, we use expected patterns. That’s the best place to put it so that they don’t have to hunt around for it.
I’m going to save my block. Now I’m going to go back to my search page. I’m going to search for DrupalCon. Now I have a facet up in the upper left-hand corner that allows me to filter by events or by blogs. If I filter by blogs, it’s reporting that I have two results. If I click that, you’ll see that I do get my results filtered to the blog that I want. That’s pretty basic stuff, but it allows your users to actually target what they’re looking for.
The next thing that I want to discuss, this is very basic, facet configuration. The next thing that I want to discuss is a pattern called progressive disclosure. This is something that you’ll see on Amazon where if you go to Amazon’s search, you’ll see that you’ll be prompted to search for something that you’re interested in, whether it’s one of the products that they have. Then when you search on that product, you’ll be displayed different filters based on the different types of things that are returned. What it prompts a user to do is start out small, like selecting the department that they want to search in, and then based on that department, it will expose different filters or facets that are relevant just to that.
I do want to take a step back and talk about the events. The events that I have on this site have dates that are associated with them, so the date that the event actually starts, whereas the blogs have a different type of date. They have the date that the article was posted.
When you’re searching for events, you don’t really want to know the date that the event was posted. You want to know the time that the event is actually happening, so you’re going to have two different types of date facets, depending on the content that you’re targeting.
Instead of displaying all of that information, all the possible combinations of facets on the left-hand side, we want to only display the facets as we start to navigate down the content types that we’re interested in.
To highlight this, I’m going to go back to the configuration page, and I’m going to go to Apache Solr, and I’m going to go to Settings, configure my facets, and I’m going to scroll down. We’re going to see two types of date facets that I was referring to. One was the post date and one was the event date. I’m going to enable both of these. I’m going to go to my blocks, position them. I’m going to scroll down, and now I see that the new blocks are here and disabled, so I’m going to position them in the sidebar first, like the other. I’m going to make sure that they’re in the correct order that I expect.
I’m going to save these blocks. I’m going to go back to my search page, search for DrupalCon. You see that by default, now I have filter by post date, filter by event date. In order to configure this progressive disclosure pattern, what we’re going to do is leverage something in Facet API called Dependencies. Instead of just explaining, I’m just going to go for it and highlight by example.
When I mouse over the facet, I get a little gear in the upper right-hand corner. If I expand that, I have an option to configure facet dependencies. This is the date that the actual content was posted, so again, it makes more sense for the blog than it does for the event. The first option that I have here is bundles, which are synonymous with Drupal content types. I’m going to say at least one of the selected bundles must be active. I’m going to say I only want to show this for blogs. I’m going to save this and go back to the search page.
Now you see that that date facet is gone. If I click on blog, now it appears. Now filter by post date. Again, I’m only shown, I’m only displayed facets that are relevant to the content type that I’m looking for.
Again, I could do this filter by event date. Again, mouse over the gear. Click Configure facet dependencies, Bundles. At least one of the bundles is active, and I’m going to say Events.
Now I go back, and when I search for DrupalCon, I’m going to start off very small, limited options, kind of guiding your users to select something and refine their results. As I click on blog, we know that we’re in the blog context, so again, context meaning information that is used to determine what type of content you’re viewing. Now that I know that I’m viewing blogs, I see the post date, which is a little more interesting.
Whereas if I click on the events, now I get the filter by event date. I can say, “Show me events that start in August of 2012 or May of 2013.” It’s not going to really target the type of events that are relevant to me.
One thing too, I’m actually going to go back to the blog facets, you see here that for the blogs, we have this drilled down thing that starts … we have a couple of blogs that span a couple of years, and the default facet that’s coming out of the box, you have to actually drill down to 2011. Now I’m going to go in March. Now I’m going to go March 21st. It allows you to drill down by the specific date all the way down to the time. But that’s actually not what users expect when you’re dealing with types of displays that are blogs, that sort of thing.
I’m actually going to go to Google and search for Drupal blogs. If I click on Search Tools, we’ll see anytime they don’t have that type of drill-down. They actually have the ability to refine by a certain range. That’s usually what users expect, and that’s a use case that people commonly ask for that we’ve seen in our support requests.
The next module that I want to explore is called the Date Facets Module. Again, this is available on drupal.org, date_facets. This can be linked to by the Facet API project page. But again, if we look at the screenshot, we’ll see that it provides a nice little display widget that allows you to display your facet in the range selection. We’re going to assume that that module was downloaded.
Click on Modules. Once you download that module, you’re going to install it. I’m using the module Filter Module to provide this nice interface where I can make sense of my modules because anybody that builds Drupal sites know that you can get up to hundreds of modules, so you need to be able to filter them more easily in this Module Administration Page. All I have to do, I already enabled this, but if I select the check box, click Save configuration, that’s all I need to do to install the module.
Once the module is installed, actually, I’ll do this from the search page, again, filter by blog, you have an option with facets to configure the display. If I mouse over the gear and click it, same list of options that allow me to configure the facet dependencies that can configure the facet display.
After I’ve installed that module, I’m going to have a new display widget site. If I expand here, you can see up at the top there is a new date range widget. The type of display in Facet API is called the widget.
If I click on date range, I’m going to click Save and go back to search page, I’m actually going to get an arrow here, which I wanted to highlight purposely. It says the widget does not support the date query we typed. When you’re doing the date range, this is a common error that people report. You have to actually scroll down and select the different query type. This just tells Drupal that we’re not just doing the date filter. We’re doing the actual ranges.
I don’t want to get into the technical aspects of it, but behind the scenes, it actually changes the type of filter that the backend uses, so it’s important that we actually make this distinction.
Now if I save and go back to the search page, now you see that I get filters that are very similar to Google. I can refine things by the past week, which I have nothing, or past month, past year. It looks like I only have stuff within the past year. But it was able to refine that based on the time range of the content that you have, so it really allows people to narrow down the things that are more recent.
Those are a couple of the tips that I wanted to share regarding the fast configuration, but I want to stop and see if there any questions before proceeding. Do you have any questions? All right. We’ll move on from facets.
The next thing that I think is pretty interesting is that instead of having a unified search page which displays all the content across your entire site, sometimes it’s useful to actually have targeted search pages. These are things like, okay, I have a blog section on my site, which we have here. I only want to search across the blogs or I don’t want to make the user actually click on blog to refine the results. This can actually be done in the Apache Solr Search Integration Module, which we’re going to focus on.
I’m going to click Configuration then go to Apache Solr Search. One thing that I’m going to do to simplify this demonstration and something that I think is useful in Drupal in general is Drupal 7 provides this nice little shortcut functionality. You see here I have Apache Solr Search with a little plus sign. I can click this and it will now add this configuration page, a link to this configuration page in the toolbar so that I can navigate to it more quickly as opposed to having to go through the normal path. I’m going to do that for an easier demonstration. If you’re configuring your search pages, you might want to do that as well.
Some of the tabs here, we have one that’s geared towards pages and blocks. I’m just going to select pages and blocks. This is where we can actually manage search pages. I’m just going to go ahead and add a search page and we can see what this will do.
The goal here again is to create a search page that just narrows down your blogs. I’m going to say this is a blog search. I’m going to scroll down. I’m going to make sure that my correct environment is selected. In this case, I’m running Solr locally, but if you’re connected to Acquia Search, you’ll have an environment for Acquia Search. Environment is really named for the backend that you’re connecting to.
Again, in title, search blogs. That’s going to be the title of the page. The path, I’m going to put in search/blogs. The part that’s going to allow me to filter just by blog content is this part at the bottom, custom filter. It’s a little complex in terms of how you do it, but first, I’m going to select that custom filter check box to make sure that I’m using a filter. We’re going to read the description down here. It says, “A comma-separated list of lucene filter queries to apply by default.”
In English, what that means is lucene is a very low-level search engine that Solr is built on, but it’s a syntax that allows you to filter by specific things and do some pretty interesting stuff. The very basic part of lucene syntax is if you want to filter by field, it will be the field name, field and then colon value. We have this use case actually down here in the comments. We see here bundle:blog. Bundle is the actual name of the Solr field, and blog is the name, is the value that’s actually stored in the index.
If you want to see all the fields that are stored in Solr, you can actually click Reports and click on Apache Solr Search Index. These are all the different field names that you have at your disposal. It doesn’t show you the values, but in our case, we know that the bundle will index the machine-readable name as we specified when we created that content type. If I go to structure content types, we see here all the different machine names. Blog is just the machine name, with _blog.
Again, I’m going to match the Solr field to this machine name. I’m going to say bundle is the name of my Solr field, and then blog is the value that we want to filter by. I’m going to save this page. Now, I have a search page that’s dedicated just to blogs. I’m going to click on this. If I say DrupalCon, now we see that it only gives me two results because it’s only filtering by the blogs, not filtering by any of the events.
Sometimes, it is nice to have these targeted searches. For example, if you do have a blog section of your site, it is very nice so that you don’t have to actually set up a separate site for your blog. You can have your blog be a micro-site that is under the same Drupal installation but just has different configurations isolating that content so users can find what they are looking for.
I want to stop there and see if there are any questions on the search pages. No? We’re good? Okay. I’m actually, just to reduce the noise here, I’m going to disable … is there a question?
Speaker 1: Yes. Is there an autocomplete module?
Chris Pliakas: Yes, there is. Let’s see if I can find it. Yes. The module name is aptly named Apache Solr Autocomplete. The project name is Apache Solr_autocomplete. This will provide the type of autocomplete functionality that people are used to.
Now, it is important to note that, and this is one of the trends that we’re going to talk about, that this actually pulls off your index and does keyword matching. But as you have larger sites and more data, then sometimes, keyword matching isn’t necessarily the best option to guide people towards relevant results. There is a trend that’s going to match statistics as well so that you can actually autocomplete based on what people are searching for as opposed to just the keywords which theoretically will guide them towards more relevant results. As I talk about the Apache Solr Statistics stuff that we’re doing, we’ll relate that back to the autocomplete.
Speaker 1: We have a few more questions.
Chris Pliakas: Okay.
Speaker 1: Can that custom search be put in a block?
Chris Pliakas: Can this search be put in a block? Yes, I believe it can. Let me just search for a module. I believe there is a module that does this. I want to see if this is what it does. I might have to get back to you on that one. I believe there is actually a module that does allow you to expose your searches in a block, but I’m not 100 percent sure on that, and so I’ll take that as an action item and post that answer after the webinar is over.
Speaker 1: Okay. Also about the statistics stuff, is that available now?
Chris Pliakas: Yes. There is an Apache Solr Statistics module that does some very basic stuff, but it’s more geared towards administrators. It does things like the keywords, but it does so more or less how many times a search page is viewed, which isn’t really that useful to site builders. But there is a new extension to that module, a new branch, I guess I should say, that is available on the community. I’ll show you where it exists and I’ll give you a bit of timeline about when that is going to get merged back in, but that’s more geared towards site builders and talks about how people are actually using your search.
Speaker 1: We have a few more questions.
Chris Pliakas: All right. I’ll take it …
Speaker 1: All of this work with non-Drupal content if some other system populates parts of the Solr index?
Chris Pliakas: The answer to that is yes. The trick is getting that data into Drupal. There are some example code, which we’ll point to the links after the webinar, that allow for more easily getting content into Drupal. But once you get the content in, you can display facets and that sort of thing.
The display of the search result doesn’t really bias towards what type of content it is. Again, it’s more or less just getting that content into Apache Solr in a way that Drupal can recognize.
Speaker 1: We have one more. Where is the extension to have autocomplete?
Chris Pliakas: Again, that’s the … we’ll do it for Google. If you search “Drupal Apache Solr Autocomplete,” I’m going to venture that it is one of the first results. It’s on drupal.org. The URL is drupal.org/project/apachesolr, all one word, _autocomplete. It’s pretty easy to find on drupal.org and it’s available on this project page.
Okay. I’m just going to clear cache just to make sure that our stuff is gone. I’m actually going to go back to Google here.
If we look at Google, we see that the search results are displayed in a format that’s pretty familiar to us. Let’s go to Yahoo!, or let’s go to Bing. Search for Drupal.
Now pretty interesting, you’ll actually see that the search results are very similar. You have the title. You have the URL. You have the snippet, and you have some additional information about it. Third thing, go to Yahoo!, search for Drupal, and we’ll see that again, different results are returned because they have different algorithms that determine the relevancy, but the display is very, very similar. The reason is because there is actually a lot of standardization that was done in 2011 by Google, by Bing, and by Yahoo! What that is is something called schema.org.
Let’s go back to Google, and we’ll look at the search results. Let’s go to our blog. We see some interesting things here. We see that when we search for our schema.org blog, you scroll down, we see one of these results has an image. This is actually a great way to talk about schema.org in that it provides some structure around your data.
When we build content types and manage fields inside of Drupal, we’re actually just configuring the data model, so that’s the underlying buckets that we put data into, and it doesn’t really have any meaning beyond what we name it. Google doesn’t understand when you create a blog content type that that’s actually blog content. It’s only blog in name only. Or when you create an event content type, it’s only events in name only. That’s almost like Drupal provides you a leg up in that you don’t have to build your database but that you can do it through the UI. But I’ll actually go back to Drupal here, click Structure, click Content Types.
You see here that I have events. If I manage my fields and I added some extra data here, the date, the event date, an address, an image, if I wanted to add another field, what you do is you create your label and then you select the type of field that you want. We see we have date, file, we have text. This is all real basic stuff that again is just really low level and doesn’t actually expose what type of content that is.
Schema.org is the layer that sits upon that which says, okay, this text field is actually an address, or this image is the primary image of this piece of content, or this event date is the actual start date of an event. It will actually go up as well and say, okay, you can say this content type event is actually an event so that it can be recognized by some standard that’s out there that’s agreed upon by the major search engines.
This actually helps your Drupal site by not only when Google and Bing and stuff index your site, it will actually read this metadata, but there is actually some work that’s being done so that it can modify the display of your internal search so that users are presented with a familiar experience.
That’s probably the thing that people will recognize the most, but the module that I want to share with you is called the Rich Snippets module. We’ll actually just install it and see what it gives us out of the box. Again, Rich Snippets, rich_snippets. There is another module that’s similarly named, but it’s important to understand that this one is geared towards your internal site search.
This takes that schema.org metadata and actually will format your results accordingly. I’m just going to install this module and see what it gives us, and then we can break it down a little bit.
Again, I’m going to go to Modules. I’m going to go to Rich Snippets, enable this. I’m going to bring up a page here so that we can see what it looks like before. Again, very blah. Now, when I enable the Rich Snippets module, we go back to my search page. I’m going to refresh the page. Now you see that it displays the results very, very differently.
The goal of this module is to work fairly out of the box. With Solr, you might have to re-index your content. But as you can see, now the results are displayed in a way that’s much more friendly and much more in line with what users expect.
As a nice UI tip, this module is going to emerge as something that’s going to be a staple on sites with search. As you can see, for DrupalCon Portland, DrupalCon Munich, it displays a little image, and it also displays the start date.
Now, for the blogs, it displays who that blog is by and when it was posted. As you can see, based on the context or based on the schema that we’ve assigned to it, the search results are displayed very differently. This is really important when we’re displaying site-wide searches. There are tools in Drupal, such as Views, which people are starting to explore to build their search pages on, but that’s not really geared towards heterogeneous content.
When you have a mix of contents, then it’s really important that you’re able to display that effectively inside your search page. Whereas views, it gets really, really tricky to say, “Okay, for this content type, display it this way. For this content type or this schema, display it another way.”
That’s the first thing that the Rich Snippets module will give us, is a nice display. Now we’ll talk about how to actually say, okay, this is a date, this is the start date, that sort of thing.
There is a module called schema.org. It’s just schemaorg, one word. It’s a very simple module that doesn’t require a lot of configuration, but you effectively download it, install it, and it allows you to effectively tag your fields and your content with the type of schema that denotes what that content actually is.
If you download and install this module, what it does is pretty simple. If I go to my structure, go to my content types, edit my content type, it gives us this new vertical tab that says schema.org settings, and this allows us to actually specify what type of content this is.
If I said, okay, this is a blog, I could start typing, and it would give me the options that are available. All the options are on the schema.org website, and I’m not going to go over them in detail because there is a lot of them. Just to give you an idea of how much there are, you start off with your basic top level stuff like an event, organization, that sort of thing, and then inside of these have various properties that say, okay, for this event, this is the end date, these are the attendees, so a lot of structured information there.
Each one has a lot. Let’s see if I can get the documentation here. Okay. That’s not what I want to show. Full list. Again, this highlights why this is a great tool for this type of search results display because as we scroll down, this is the nested hierarchy of schema.org schema and properties, so you could see there is a ton of them. The module right now supports a subset of them but it’s going to support more.
As we’re building our content, it’s really important that you use this module and explain what your fields are. When I actually create a field, if I click Edit and I scroll down to the edit settings, you see here that I also have schema.org mapping so I can say the property. I could say this is the start date. Then what the Rich Snippets module will do is based on your schema and properties, it will display your content differently.
Because this is start date, if I go back to my DrupalCon settings, then it knows to display the actual start date up here because based on this result being an event, it’s probably what people are going to be interested in, so it gives them some context about the content that’s being returned so that they can see what’s going on without necessarily having to click on the piece of content itself.
I’m going to stop there and take a couple of questions for two minutes, and then we’re going to move on to statistics and then stop for general questions.
Speaker 1: Okay. We have two questions. Can the custom Solr search results page be used in panels? This might be from the last section.
Chris Pliakas: Yes, I believe it can be. The reason why I say that is because the Acquia Commons distribution is making heavy use of panels and is using Apache Solr for its search engine. I say with confidence that yes, it can be used with panels.
Speaker 1: There is one more. Where is the extension to add to Apache Solr autocomplete which allows for statistics to be involved and not just keywords?
Chris Pliakas: That’s one thing that’s not available just yet, but it’s on the road map for the statistics module that we’re going to display next. This is one of those items where I wanted to make people aware of the different trends that are emerging. This is one case where it hasn’t been implemented yet, but it’s going to be implemented. As you start to look forward in your search solutions the next three, six months, look for this as an option.
Speaker 1: We have one more question. Why do we need Acquia Search when everything seems doable from Drupal Search?
Chris Pliakas: Yes, and that’s a great question. The first thing is that Drupal Search won’t scale. The Drupal is built on relational database technology, and relational databases simply won’t scale for full text searching. They’re really geared towards saying, okay, find me all blogs or find me all users, that sort of thing. But when you start to enter keywords into the mix, it will take your entire site down pretty quickly because it will bog your stuff down.
Regarding Acquia Search, you can run Solr locally, and we’ve contributed a lot of these add-ons back to the community. However, the value add that Acquia provides right now is that we have Solr configured in a highly available cluster, so there is a master/slave replication so that if one server goes down that end users can continue to search. We also integrate the tools that allow for file attachment indexing. We also have a security mechanism that we’ve applied on top of Solr.
Solr actually doesn’t have security out of the box, so you can actually do a Google search and find a lot of Solr instances that are unprotected. You could delete that index. You could add content to that index. We’ve added a security on top of Solr that allows you to connect securely and make sure that you and only you have access to your server. Also, we manage it 24x7.
One of the things I do want to talk about going down as we talk about statistics and contextual computing, there are things that we’re experimenting with Acquia Search that will adjust relevancy based on user actions. This will be a set of tools that integrate with Drupal and integrate with various tools that will provide more relevant results to your users beyond just keywords. There is going to be a lot of value and a lot of focus on contextual computing with Acquia Search that’s really going to differentiate it from not only core search but from using Solr locally.
Speaker 1: There’s a few more, but we can get to them at the end.
Chris Pliakas: Yes, sure. What I’m going to do is just wrap up really quickly with the statistics. There is one point that I want to hit home, and I’ll try to stop by 1:55 to save some time for some questions afterwards.
There is an Apache Solr Statistics module. Let me clear out some of these tabs here. I think that’s it. Or maybe it’s Apache Solr Stats. It’s probably Apache Solr Stats. There we go.
There is an Apache Solr Statistics module that you can download, it works for Drupal 6 and Drupal 7, that gives you some information in terms of how many requests there are, what type of things people are searching for, but it’s more geared towards site administrators, not necessarily search page builders. The reason why I say that is because if I go to my search and I search DrupalCon, it’s going to count that as DrupalCon, the keyword being searched.
If I click on events, since the page reloaded and it actually queried Solr again, that statistics module is going to say, okay, DrupalCon was searched again. What this really does is it says show me content where people have to click around to find what they are looking for. It’s not necessarily indicative of what people are actually looking for on the site.
One of the branches that’s being worked on, it’s actually a sandbox project right now that will be merged into, back into the Apache Solr Statistics module by Q1 of next year … I can’t find it here … there is a sandbox that’s an Apache Solr Statistics fork that’s used to experiment with this stuff. That’s what I’m going to be showing you today. The important thing is that it’s more geared towards the search page builders, and it also tracks what people do after they search for something. It allows you to track what we call click-throughs.
If somebody searches for DrupalCon, we can see what pieces of content people are actually selecting, so we can make informed decisions about how to configure our search and how to modify the relevancy.
What I’m going to do is click on modules, search toolkit, and enable Apache Solr Statistics. When I click on Apache Solr Search, now I have a new tab that says statistics.
What I want to do is I want to enable the query log. This captures stuff about what searches are being executed. Also, I want to enable something called the Event Log. In order to enable this, you have to copy a file from the module to your Drupal Group so that it can capture the information as users are clicking on it.
There is also what I’m about to explain, the law of retention policy and backend, by default, logged to the database, but for busier sites, again, there is going to be the availability to send that to different sources.
I’m going to say a configuration, and I’m going to execute another search. If I search for DrupalCon and now click on DrupalCon Portland, if I go to Reports, Apache Solr Index, Statistics, this gives me some interesting things. It gives me the top keywords, so it shows me what the top keywords are that people are actually searching for. Equally as important, top keywords with no results, so you can see what people are searching for and not getting any results for.
If people don’t find the content they’re looking for, they’re going to leave your site, so this is a really important metric. Also top keywords with no click-throughs, so if people are searching for things and they’re getting results but they’re not clicking on anything, then there is probably going to be some modifications to make sure that they’re getting displayed the correct results.
Here, we see the top keywords. We also have click-through reports. If I click on that, it will show me the pieces of content that people are selecting in the count. As you start to gain some more traffic on your site, this will give you some transparency in terms of what people are doing on your search page, and more importantly, what they’re doing after.
As we talked about the contextual computing, it’s really important that you monitor what people are looking for, and this is a great way to do it. Again, it’s what people are looking for in your site and what they are selecting, what they find relevant. The search page is a great tool to help you modify your experience and tailor it to your users.
We have a couple of slides to end up, but that’s really what I wanted to highlight, is that contextual computing is more the trend, that there are some tools that you can employ now that are going to be improved upon in the future to make sure that Drupal is the best solution available in search to serve relevant content to your users. Search is really becoming a big data problem, and search is also becoming a solution to that problem.
Big data is capturing a lot of information and then making sense of it, doing something with it. As your sites begin to amass a lot of data, search is a great tool to help your users sift through that data and find the relevant content that they’re looking for, and that’s really where the trend of computing is going over the next five years, so definitely pay attention to search as a tool to help make sure your site is keeping up with the latest trends and desires of your end users as they look for engaging experiences.
I went over but we’ll take some more questions.
Speaker 1: Okay, great. Would you recommend using these modules on a Drupal 6 site using domain access?
Chris Pliakas: Domain access is a little bit tricky especially with search. Some of these things are … let’s take a step back. The way a domain access works is that it builds upon the Drupal node access system, so that adds some challenges in terms of search. Not only does a search solution have to be domain-access-aware, but everything around your site has to be domain-access-aware.
Theoretically, you can use your Drupal 6 site with domain access. It’s just that it gets a little bit tricky because your index is logically separated as opposed to physically separated, so there always is the chance of your content either lagging behind in terms of getting that access information or accidentally getting exposed to other sites when it shouldn’t be, so it can be done, but there has to be a lot of thought and a lot of careful planning to make sure that it’s implemented properly.
Speaker 1: The next question is does the schema.org also expose the extra info to search engine spiders?
Chris Pliakas: Yes, it does. That’s actually what the module is geared towards. It’s geared towards the external use case, and it works, it provides that metadata that Google will pick up the images and the additional metadata. But what the Rich Snippets module does is it takes that information and uses it inwards. By default, it actually is geared more outwards, but the work that’s being done right now is taking that and also apply it to your site search, so it’s a win-win.
Speaker 1: The next question is what if the non-Drupal contents are dynamic pages, how do you import those contexts? If not, is there a federated search solution?
Chris Pliakas: I think it’s important to first say a federated search solution might not be exactly what’s being asked for. When we think offederated search solution, we think of things like Kayak or other engines that actually query out different data sources and compile them together.
There are tools in Drupal that allow you to query different sources simultaneously. However, that’s probably not what you’re looking for. You’re probably looking for a unified search solution that displays results instantly.
In order to do that, you can leverage tools such as crawlers, such as Nutch, which will integrate with Solr. The key is again getting that data into a format that Drupal can recognize. But the trick is using those tools to crawl or expose your external data to get them into Drupal.
There are also ways that you can programmatically connect a third party data store and index that into Drupal using the APIs. But again, it’s more of a developer task and something that has to be coded.
But with Acquia Search, definitely look for an offering sooner rather than later to index external content and bring it into your Acquia Search Index.
Speaker 1: All right. We’ll take one more. How can you make information more important based on the statistics? What ways to set this up are available?
Chris Pliakas: Can you repeat that question one more time? Sorry.
Speaker 1: How can you make information more important based on the statistics? What ways to set this up are available?
Chris Pliakas: Sure. I’ll give one example from Acquia.com. We have an offering called Dev Desktop, which is a local stack installer for Drupal. A long time ago, it used to be called DAMP, Drupal, Apache, MySQL, that sort of thing. What we actually have noticed is that based on our statistics, people still search for DAMP more than they do Dev Desktop. We noticed that trend, and the way that we modified our search results was to take advantage of some of the things that Apache Solr has, and when people search for DAMP, we add a synonym to Dev Desktop so that when they search for DAMP, they’re actually getting the content that’s relevant to Dev Desktop, which is what the products mean now.
This is what Google does. This is why Google results are very relevant. They have hundreds of full-time engineers analyzing their search and doing things like saying, “Okay. If you search for a FedEx tracking number, we’re going to show you the FedEx webpage.” Now it’s automated, but that was used by analyzing the statistics, and those are the types of techniques that you can employ on your site based on what your users are actually looking to do.
Speaker 1: Okay, great. I think we’re going to have to end it here. Thank you, Chris, for the great presentation, and thank you everyone for participating and asking all these wonderful questions. Again, the slides and recording of the webinar will be posted to the Acquia.com website in the next 48 hours. Thank you.