Home / Taxonomy term

Webinar

Accessible Theming in Drupal [December 19, 2012]

Click to see video transcript

Hannah: Today’s webinar is Accessible Theming in Drupal with Dan Mouyard who’s a Senior Interface Engineer at Forum 1 Communications and I’m Hannah Corey and I’m a marketing specialist here at Acquia

Dan: Hello, I’m hoping everybody can see the slides Accessible Theming in Drupal. As Hannah said my name is Dan Mouyard, Senior Interface Engineer at Forum 1 Communications. I’ve been working in Drupal for about four years now and I’ve been involved in Drupal 8 development core development and I’ve been involved with the HTML5 Initiative, I’ve also been involved with the mobile initiative and I’ve also been a member of the accessibility team for the past couple of years where we get together about once a month and chat and try to talk over some of the accessibility issues going on in Drupal core and contrib. Interface engineer so that’s basically a fancy way of saying themer and so I have a lot of experience with HTML, CSS, Java Script as well as PHP.

As a personal note I’m also legally blind and I wear hearing aids so a lot of the accessibility issues that I work with every day affect me on a personal level and I hope I’ll be able to share some of that insight. The focus for today’s talk is primarily on theming especially in Drupal 7 although some of the topics that I’ll cover I’ll also bring on some of the techniques that we’ve been working on for Drupal 8.

Hopefully you’ll have a good grasp of HTML, CSS and maybe a little bit of Java Script as well as just basic theming topics such as template files and stuff like that and also finally the focus is on accessibility and with a title like accessible theming most of you you tend to probably you’re already familiar with accessibility so I won’t go too much into detail as far as what it is or why it’s important or what that we need to work on but you wanted to point out that my view on accessibility is a little bit different and here’s a great quote from Tim Berners-Lee about the world wide web and he said, “The power of the web or the universality accessed by everyone regardless of disability is an essential aspect.” How that everyone because to me that is the focus of what we do when we build websites is that we want to build them in such a way that everyone can access them regardless of whatever limitations they may have and those could be disability limitations, they could also be device limitations, internet connection, there’s also what’s called situational impairment. Such as being in a bar where it’s very noisy and you’re trying to watch TV. That situation can hamper things so stuff like post captioning could really help and the same thing occurs on the web. There’s things we can do that are vitally important to those with disability but they also help everyone.

Before I get into some of the nitty gritty details of the techniques for advanced theming I just want to highlight really quickly my overall philosophy and that’s through this idea called progressive enhancement and the whole idea of progressive enhancement is just layer on functionality from the simplest most basic to more advanced.

First you start off with HTML. That most – devices and browsers can understand and then you layer on CSS and then you layer on behavior with Java Script and also even within those different areas you can add on layer to more complexity. With HTML you can layer on some of the newer HTML5 elements and with CSS you can layer on top some of the more advanced CSS3 reflective and the rounded corners just stuff like that and again with Java Script.

The reason for this approach is this ground up approach is that regardless of however you’re accessing that page you still get a usable interface and a usable way to consume that content. The main thing that I’ll be covering today is first I’ll be going over some of the accessibility stuff in Drupal Core that you should really be familiar with during theming and then I’ll cover some HTML basic of creating accessible HTML and how you can implementing that in your theme. I’ll also be going over quite a bit of different CSS techniques to make things more accessible and then finally I’ll be going over Aria it’s just a new way you can add basically meta data and semantic to HTML that can help assistive technology understand.

First Drupal Core, what have we done with Drupal Core particularly Drupal 7 that makes things easier accessible to wire that you really should be familiar with when you’re theming? The first one is we’ve added three helper classes. Those are element dash hidden, element dash invisible and element dash focusable.

The first one Element dash hidden here’s to CSS. Basically you’re just applied to CSS property display none. Now what this does is it tells browsers and any other devices that, “Hey, this part of HTML just ignore it.” And so all browsers and devices and such technologies such as screen waiters they won’t present that information. There are times however you want stuff to be hidden visually, but yet that information will still be available to devices such as screen waiters and that’s where we use element dash invisible. You can add this class to anything in Drupal and it’ll essentially hide it from view yet still be there so that screen waiters don’t read aloud whatever the content is inside of it and the finally some of the work done element invisible is the element dash focusable class.

Here’s the CSS. Basically what it does is any hidden element using invisible if you also add element dash focusable class to it whenever keyboard users or other devices tab through the mark up when they hit one of the units focuses it’ll bring it back into view and go back to your tech test and how will you just skip links.

Skip links are the first links on every page which allow users to skip directly to other sections of the same page most often to the main content. Now in Drupal itself it uses only one skip link and it links to the main content but you can have others such as a link to the main navigation or a link to the search box. This is the first link that occur on the page so you really want to limit how many they are. You should never have more than three.

Here’s an example of the bar tech theme and you can see at the very top where it says skip to main content and that’s the skip link. Normally you don’t see that because it has the element dash invisible class but when the keyboard uses a tab to it because it’s using the element dash focusable it pops into view and that allows people to jump down to the main content.

That’s also used in the seven theme here’s an example, which is the default admin theme for Drupal 7. Again if you would tab the first thing that would pop up would be the skip to main content. How this is done is in the HTML template which is provided by the core system module. To add just a little bit of HTML at the very top right after the body tag and the important thing to really focus on here is the target of that link where it says main dash content and the reason that it’s important because that needs to be a valid link and sometimes whenever new themes are created they ignore the skip link or in the page template they don’t provide the target for that link.

Then the page template you should have one of two things. you should either have an anchor tag with the ID of main content or you should have a div or some other HTML with the ID main dash content that enables a skip link to functional correctly and then finally in Drupal Core we do a lot of work on forms and one thing that’s really important about forms and accessibility is that all form elements need to have either a label or a title attribute and Drupal’s form API uses the title property to set these.

Now Drupal’s form API is very powerful. It uses the raise help developers build these forms and so here’s a little a snip at a PHP. Now you’re creating a new form element and just for the sake of this demo we’re creating a text field such as names stuff like that and then the title property that palm title you see where we’re setting a description and for those of you who are not familiar the T function that just allows it to be translated.

What this will do is this will create a text field form element and the label for that element will be description and the markup will be put in such a way that the label is correctly linked to that element but there are times when theming when you might not want to show that label or you might want an attribute of third you might want to move it around. In Drupal 7 the form API was the title display property and there are four possible options.

By default it had before with a label show up before the element. The next after where the label comes after the element in the mark up order and then invisible what this will do just will add the element dash invisible class to the label. It’s still there and screen readers can still hear the association of what that label is to that element and then finally attribute and let you set that then instead of outputting a label for a form element it will set that title as the pedal attribute for that element.

Here’s an example that same one we had before where we added the title display property and we set it to invisible so again this markup will output a text field and description will be the label for that element and it will have the class element dash invisible so it won’t be visible however it will still be read by screen readers.

Next HTML theming, you need to really focus on for accessibility and the first one is just a general best practice is you want to have semantics? Here’s an example code of – no article. Perhaps it’s a node or something like that. You can see where users some of the HTML5 elements and stuff. We have an article, we have the heading, the footer element which is new in HTML5 which doesn’t necessarily have to be at the bottom it’s just a semantic saying, “Hey, this is information about this part of the mark up.”

There’s also the figure element for adding images and the caption and the paragraph where you can emphasize things and even an abbreviation. All this stuff you can make it use different code that’s unsemantic and it will look at the same as well as in theming and just use bids and class and stuff and making it the same exact look however you lose some of that semantic meeting and some of that meeting it doesn’t really make a difference accessibility wise but in the future especially with some of the HTML5 elements newer sister technology can take advantage of this information and so you can change this markup usually in template files in Drupal.

It’s pretty easy you can just open up a template and you can just rename this to article something like that. Where it gets really tricky however is whenever you have custom content types and you have say a bunch of fields on those content types and by default the mark up Drupal outputs for those fields is pretty cluttered because it has to be usable of course through a wide variety of used cases.

You can use individual field templates to overwrite those mark ups to however you want however it’s very difficult to create those field templates that will work across all projects. What I recommend is using the fences module and what defensive module does and its jupitaorg/project/fences and this allows you so that when you create those content types and you create those fields you can define the mark up.

Here is a content type where you can manage fields and what fences will do is if you go under operations and you click edit next to one of those fields now that’s where you can set, “Is this required? What’s the maximum length? What’s the default value.” What the fences module does is it allows you to define the rapid mark up for that particular field.

Usually whenever you’re creating these content types, that’s when you have the best idea of the semantic meeting of those individual fields. The next thing to really focus on for HTML is that the document order is the tab order. If you go to any page HTML page and you use a source you see all the mark up and machines and assistive technology and screen readers they will read through that in the order of that code and that’s very important because for keyboard users they might have visibility issues and can’t use a touch device or a mouse. They have to use a tab on the keyboard or some other binary movement device. They can tab through to different areas on the page and it goes in the same order as this code.

Here’s the example, if you look at the page template. Now this is the default how it orders things. It also has the header, the navigation, the bread crumb the main area and then the footer and then within those the header. The logo the site name and slogan then if you have any blocks assigned to header regions they come after that. You can see the order of how things are and the code. Whenever somebody is tabbing through the stuff on a keyboard you have to be mindful of what order these are because whenever they’re tabbing through the page this is the order that they will hit things.

Now you also notice that the header and the navigation they come at the top. You have to go through quite a bit before you actually reach the main content and that’s the big benefit of the skip link because the skip link is in the HTML template which is a wrapper for the page template so it would come first and so if you hit that you can quickly skip and jump down to the content. You don’t always have to continue tabbing through all of the logos and the navigation and that stuff.

Another thing with HTML is in addition to the source order there’s also the mark up that you can use. Now in Drupal 8 with the HTML5 Initiative we went through all the templates and we tried to use the best HTML6 elements that we could. In this example, the page template we can use the new HTML5 elements. Now we can use the header element, there’s the nav element for the navigation stuff. There’s also the section element so you can create different sections and also the footer element.

The next thing for HTML you really want to be mindful of whenever you’re theming accessibility wise pretty much the one thing you have to focus on for images is you need to have text alternatives for that content. Just like with videos you need to have some tech so that people who can’t see the videos, can’t hear or have perhaps they have flash block and they can’t see the video at all there’s some other information there that they can get some the same thing with images and the primary way to give that alternative tech for images is with the alt attribute.

Now the key thing to keep in mind with the alt attribute is one every image element needs to have an alt attribute. For decorative images perhaps it’s a flower that’s part of the design but it’s not really part of the content you can just leave the alt attribute empty. If it’s a content image then you have want to have that alt tech be short concise description about the image and if that image happens to be inside of a link that image is pointing somewhere then that alt tech should describe the destination of that link and the reason you even want an empty alt attribute on the decorative image is because a lot of screen readers if they see the empty alt they’ll ignore the image which is want you want. Otherwise sitting on a screen reader they’ll say, “Blank image blank image or this is an image or unknown image.” It’ll repeat that so like really interrupt the flow of content.

Also with the HTML as we get more and more into not just desktop but also the tablets and mobile devices you also want to keep in mind the view port. The view port is essentially what the screen is for that particular device. How does it render it? What I’ve seen all too often are these two view port Meta tag.

The first one sets an initial scale a minimum scale to maximum scale and they set it all to one and that prevents zooming which you really don’t want to do because on mobile devices and the tablets sometimes the text is too small for people with vision problems so they want to zoom in and they use the touch zoom in and they can’t and so this prevents them from being able to read that text.

The second one prevents scrolling. Sometimes they might have zoomed in but they can’t scroll to part of the page that is now something out of the view port. The recommended that I recommend is just have width equal device width. This works very well on all devise and just says, “Hey, whatever the device width of the device you have set that as the width of this page.” It allows zooming, scrolling and all that kind of stuff.

Okay so the next big section is CSS and so all the styles and there’s a bunch of different stuff you can do with CSS to make sure your site is accessible. The first one is you should be familiar with the image replacement technique. Just like we had images in HTML and we have the alt texts sometimes you might want to add those images that are decorative as CSS backgrounds but you still want screen readers and search engines to still be able to get that text and that information

Here’s an example where you have – here’s a header where the texts visits the capital. You still get that semantic information but you maybe want to show an image set. so you can use CSS to define an image background for that header and then it’s important you set the height and the width to be the dimensions of the image and there’s a couple of different image replacement techniques that you can use and the one that I have shown here is one of the newer ones so that it will hide the text and still only show the image background.

The next important area for CSS is styling links and three main key ideas you want to keep in mind whenever you’re styling links. One is you want to make them obvious. If you have links on the page you want people to know right away, “Hey, that’s a link I can interact with.” Next is you want to design all sticks of lengths. When it first comes up you also want to have something different for hover or focus whenever somebody tabs through and is focused on it. It’s like whenever somebody actually clicked on that link and also visited so people know where they’ve been and that’s why you want to make them easy to click.

This is just general usability best practice and it’s also very helpful for people with mobility problems. They have a hard time being very accurate with their mouse movements. It’s also very helpful for tablets and touch devices where they have to use fingers and people have really big fingers compared to the length that they’re teaching.

Some cool trick that I’ve done with CSS and links is one if you don’t’ have enough time to design the focus state which is ideal just at a really good default is that whenever you set CSS properties on hover do the same thing for focus and this way if you add the background, different color or stuff like that whenever a mouse hovers over a link the same styles get applied whenever somebody is tabbing through on a keyboard. They can easily see where they are and what’s highlighted.

Another problem I often see is the link outline. Now quite often in reset style sheets you’ll see this just generally applied to all anchors it just says outline zero and that’s because on certain links and stuff on browsers if you click on them and stuff you get this weird outline and it sometimes doesn’t look good however that outline is very important for keyboard users who are tabbing through and so a better default is to use this then you reset and set. Where you set the outline to zero for the hover in the active state but for the focus state you make sure it’s still there that way keyboard users still get the benefit of that outline.

Finally you want to have a big target area for people to work on and this is what I like to use as the default anchor tags is set the margin to negative two pixels and the padding to two pixels and this works on all the inline links and basically just to add two pixels of clickable area around each link and also if you’re tabbing through the outline won’t be right next to the text anymore it’ll be two pixels and it’ll make that text a lot more readable.

The one caviar if you’re having this in your reset is that you want to make sure whenever you’re styling other links such as menu links just stuff like that you want to make sure that you’re overwriting the margins and the paddings so it doesn’t screw things up.

The big part of making themes accessible with CSS is to really pay attention to typography. Now and it’s been said web design is like 90% typography because that is basically what people go to websites for is to get content and to read that information and you want to make that as easy as possible for people to do.

Here’s a great quote from Emil Ruder, he was the head of the Swiss Design School then he’s in – he’s a big influence in a lot of typography work and he said, “A printed work which cannot be read become the product without purpose.” That is the essential purpose of text is to be read and so some general things you want to keep in mind with typography is first of all you want to choose the right fonts and you want to make sure that they’re legible and you want to do as much as you can to make sure that they’re readable easy for people to see what they are.

Some of the display fonts sometimes the lens look they may look cool but they’re a lot harder to read and so you have to balance the design versus how readable it is. Another thing is you want to pay attention to the measure.

Now the measure the typographic term essentially saying how wide a line of text is. If you have a block a paragraph the measure is how wide it is and you don’t want that measure to be too wide because then as I read down the line and get to the end and it comes back to the beginning if that distance is too great and you miss your landing spot. It’s hard to see where the next line is but at the same time you don’t want that measure to be too narrow because then your eyes become like a pinball machine jumping back and forth very quickly from one link to the next.

Another thing is to use the appropriate leading. Leading is another typographic term and it basically means how much space is there between lines and text and in CSS this is the line height property and again just use the go to actual. You don’t want too much space between lines and you don’t want too little space so they crowd together.

Another good general rule of typography is you want to create a nice hierarchy. You want to create a nice hierarchy in the sense of perhaps heading structure. You want to have the heading structure be such a way so that you know this chunk of text belongs in this area. You also want to create hierarchy with lengths and stuff to make things easier to scan and you also want to use font sizes to make this hierarchy more noticeable more make this stand out and then some more detail just stuff to pay attention to.

The first big one is font size and so for nay of the body copy – like so you go through a page and it’s an article page you want that text to be the default font size. If you’re on CSS you want it to be 100% which basically is 16 pixels in all bounces. You want make that – they set it at that size for a reason because it’s very readable and back when the spec was first defined the 16 pixels on a monitor with the same size with PowerPoint that you would see in a book and one of the good things about using the default font size to body copy is that I had mentioned before about the hierarchy you have a much broader spectrum of font size that you can use for some of the smaller detailed stuff. The byline, the more information, full length that kind of stuff.

I remember it was several years ago where like people would set the body copy in like 13 pixels or 12 pixels because it was very very small it’s hard to read and everything looked the same because you couldn’t really get much smaller than that. so the byline, the more information link they were all the same size and very small and the small tech would create eye strain having to read stuff that’s small especially on monitors because it’s not as crystal clear as it is on a pungent magazine or newspaper and the final one good hint I have for whenever you’re designing with font size is to use actual text rather than just the Latin lorem ipsum because you have actual text there. As you’re designing you’re like, “Oh.” You can tell right away whether something is readable or not.

The other thing you pay attention to typography wide is the text alignment and in general you want to let the link the main chunk of body text. For a line if you have a right to left language and you want to avoid using fully justified texts because they create rivers and rivers are like the larger spaces between words that flow through the document and it can really throw off the rhythm of the text and it’s especially hard for people who are dyslexic and again having text alignment that left align that makes it easier to scan text. It easier to jump from paragraph to paragraph and scan through length and stuff like that.

Finally you want to make sure that there’s enough color contrast between the text and what’s behind it. What’s very good at increasing visibility to make things easier for people to read? Here’s a great website contrast dot com and it’s a great example of how high contrast design can still be beautiful and it also has a lot of good facts of why it’s important and the color contrast you want to make sure the smaller text it needs high contrast because it’s smaller. You need to make it stand out a little more and one caviar however is that you can’t have too much contrast. I don’t see this mentioned often whenever people talk about accessibility contrast but for people who are deflecting if there’s too much contrast it can actually make it harder for them to read and so some good tools I recommend is one is a really good tool that we’ve put out recently the contrast ratio and it uses the W3C W Color 3.0 double A and triple A contrast ratios you can easily input color values and see how readable stuff is.

There’s also online color filters great bit dot com and color filter dot work line dot org and they allow you to just put in the address that whatever website you’re looking at and you can see what that text looks like in black and white and under different types of color blindness and then my favorite is the color blindness simulator is that color ore core dot org and this is the program that you install on your computer and they have versions for Windows, Mac and Linux and what this will do is while you’re working you can have whatever – if you’re in a website or if you’re in Photoshop designing it will easily switch the entire monitor to a certain type of color blindness and you can check out and make sure that the color contrast works for whatever you’re designing.

Finally for CSS techniques the tone to responses design paradigm where you build and theme stuff in such a way that it looks good across all devices. So the big idea behind that is the layout. We use media queries to adjust the layout and media queries are basically just CSS that says, “Hey, if this condition is try apply this CSS.” And then you want to set the break point based on the content rather than the devices.

The break point are the different queries where things change. for example if you had a mobile layout where everything’s a single column at some point you have enough room that you may have two columns and where that changes is the break point and you want to set the break point based on how the content looks make sure it’s still readable rather than changing it at different devices and that’s for two main reasons.

One is device sizes are going to change. In five, ten years there will be a monitor on your refrigerator where you can see the website of the grocery store for example and we don’t know what size that screen is going to be and so if you set these break points based on content.

Another reason why you don’t want to set it on device size is that these media queries can be triggered when people zoom in in their web browser. People who have vision problems can zoom in and they can trigger these media queries on caviar. That doesn’t currently work in web browsers but they’re in the bug for it and they are going to be working on it.

Here’s an example of a media query. Here’s some CSS and so for the body we’re shedding with the margins, the max width and the padding for it and so what’s what every browser and device will use that and below it where it has the media screen and min width so that says, if the device screen if the width is 35M or larger then apply the CSS inside and so that’s where we can have a body and set the max width and change things around.

Finally one area of technically we can work on is WAI and so basically this is the Web Accessibility Initiative which was created by the W3C to tackle accessibility stuff and so this is the accessible rich internet application. There’s are – it’s especially Meta data that you can apply to HTML that helps assistive technology understand the purpose of that HTML. Just another layer of semantic that gives more information.

The use of it. Basically it defines a way to make web content and web application more accessible to people with disabilities. It especially helps with dynamic content and advanced user interface control developed with AJAC, HTML, Java Script and related technology. The different area, roads and landmarks and attributes that you can apply to make things more usable to assistive technology.

The only thing we really need to worry about in theming as far as getting started is the land mark roles. If we look at the page template we’ll see these big areas. We’ll have a dim with an ID header, navigation to make content and the footer. What was done for triple A is that we’ve used a new HTML5 tag the header tag, the nav the footer and in the future those tags will go assistive technology can take advantage and they would know what those tags mean.

Some of these technology however can’t understand those yet. At the stop gap we can use Aria. We can use for example replace it where for the header we give it the role of banner and for the nav we give the role of navigation. For the main content we can give it the role of main and for the footer we can give it the role of info and there’s even a new element called the main element which they’re discussing at the W3C and it looks like it’s going to gain track and it will be used but it’s not quite ready yet. We can use the main element instead of the section with the main content and assistive technology that can understand that can now also have the ability to jump that main content rather than having to use a skip link right now.

Okay, before we stop for questions I just wanted to demo really quickly the responses layout. For the past seven months or so at Forum One we’ve been working on the EPA dot gov that can be moving things over to Drupal and so we’ve been building the Drupal platform for them and I was the fine art architect for that project and theming and so one of their big wishes was for responsive design and also accessibility. They really wanted to focus in on that.

Here’s an example of one of their learning pages and one of the benefits of having these responsive layouts is that in addition to the screen readjusting itself at different device width it’s also help for people who have vision impairment that might need to zoom in and the immediate queries can be handled that way. So for example we can zoom in but first let’s look at the default desktop view and suppose somebody needs to zoom in and wanted to be able to read better.

As they zoom in it triggers the immediate queries and you’ll see things and the layout and stuff will change and you’ll see like for example the search field ban that may not widen the bigger the stuff that’s still readable and usable even for people with vision problems they need to zoom in. they rearrange and fit.

Again maybe people are really need zoom in even more and they keep going they can even get the mobile what people might see on a mobile layout. You could see how things change. Length that gives a little more padding so that people with fingers it’s easier for them to touch stuff and again all this stuff makes things usable not just for people with devices but also people with accessibility problems and they might need to zoom in. Okay, we have some time for questions.

Hannah: Hi everyone, if you have any questions please ask them in the Q&A pad in the web XEY please. Okay we have one question come in. what web accessibility checking tools would you recommend now that Bobby is no longer.

Dan: Yes, as far as automated tools one of my favorite currently is the wave tool bar and the new beta version is like the version five of the wave and it’s actually very good. There are also the new bookmark it basically you add it to your bookmark and it’ll go through and check all the mark up for you.

Hannah: Okay, the next question is do you recommend using Zen theme?

Dan: Yes, basically when it comes to themes you want to use whatever you’re most comfortable and a really good thing is you’re choosing themes and you’re really concerned about accessibility is on each theme they might have the tag D7AX which says they really paid attention to accessibility issues. I know John he has put work accessibility wise into the Zen theme and we also have others such as adaptive themes. I know Omega have done some so yes.

Hannah: One more is there a Jay Query book that focuses on accessibility design?

Dan: As far as accessibility design for Jay Query I haven’t seen on but there’s actually a very good book I think it’s called Progressive Enhancement with Java Script and it’s put out – yes I think like within two years ago and it talks all about progressive enhancement and how you add on Jay Query to create widgets that are accessible.

Hannah: Okay, the next one is can you go over how zooming is accessed in media queries?

Dan: Sure, again with media queries now say here we have the media screen and then inside that parenthesis you can say you can test for different things such as aspect ratio, device width, device height that kind of stuff and in general we tend to use the width as the thing we test for whenever doing these layout stuff. Especially if you use M rather than pixels then whenever you zoom in those media queries get triggered. Stamp on this site zooming in and out it triggers the same media queries just as if you work your resize screen. Any other questions?

Hannah: None are coming through so I want to say thank you so much Dan for the great presentation and thank you everyone for attending. Again the slides and recorded webinar will be posted to the Acuaia dot com website in the next 48 hours. Dan do you want to close with anything?

Dan: Sure, you can reach me at dmouyard@forumone.com if you need to email me any questions and also you can follow me on Twitter at DCmouyard that’s also my Drupal.org user name that’s also my RC Nick and also my Skype handle.

Hannah: Great, thanks. Everyone have a great day.

Building a Common Drupal Platform for Your Organization Using Drupal 7 [December 18, 2012]

Constructing a Fault-Tolerant, Highly Available Cloud Infrastructure for your Drupal Site [December 12, 2012]

Click to see video transcript

Hannah: Today's webinar is: Constructing a Fault-Tolerant, High Available Cloud Infrastructure for your Drupal Site.
First speaking we have Jess Iandiorio, who is the Senior Director of Cloud Products Marketing, and then we have Andrew Kenney who is the VP of Platform Engineering.

Jess, you take it away now.

Jess: Great, thank you very much, Hannah. Thanks everybody for taking the time to attend today, we have some great content, and we have a new speaker for our Webinar Series. For those of you who attend meetings you know we do three to five per week.

Andrew Kenney has been with the organization since mid-summer, and we are really excited to have him, he comes to us from ONEsite, but he is heading our Platform Engineering at this point, and he is the point person on all things; Acquia Cloud specifically, he'll speak in just a few minutes.
Thank you, Andrew.

Just to key up what we are going to talk about today, what we want to talk about, is we want our customers to be able to focus on Web innovations, and creating killer websites is hard, so that’s why we wanted to be able to spend all of the time you possibly can, figuring out how to optimize your experience and create a really, really cool experience on your website. Hosting that website shouldn’t be as much of a challenge.

The topic today is designing a fault-tolerant, highly available system and the point of the matter is, if your site is mission-critical how do you avoid a crisis, and why do you need this type of infrastructure?

Andrew has some great background around designing highly-available infrastructure and systems, and he's going to go through best practices and then I'll come back towards the end just to give a little bit of information about Acquia Cloud as it relates to all the content he's going to cover, but he's just going to talk generally about best practices and how you could go about doing this yourself.

Again, please ask those questions in the Q&A Tab as we go, and we'll get to them as we can. For the content today, first Andrew is going to discuss the challenges that Drupal sites can have when it comes to your hosting, what makes them complex and why you would want a tuned infrastructure in order to have high availability. He's been able with the types of scenarios that would cause failure, how you can go about creating high availability and resiliency, talk about the resource challenges with some organizations may incur, and then you may go through practical steps in best practices around designing for failure and how you can actually do that and architect and automate the failover as well. He'll close with some information on how you can test failure as well.

With that, I'm going to hand it over to Andrew, and I'm here in the background if you have any questions for me, otherwise I'll moderate Q&A and I'll be back towards the end.
Andrew: Thank, Jess. It's nice to meet you, everyone. Feel free to ask questions as we go or we can just have those at the wrap up, and I'm more than willing to be interrupted though.

Many of you may be familiar with Drupal and its state as a great PHP Content Management system, but even with it being well engineered in having a decade-plus of enhancements, there some number of issues with hosting Drupal and these issues were always present if you're hosting in your own datacenter, or environmental server in, let's say, RackSpace or a SoftLayer but even more challenging when you're dealing with Cloud hosting.

The Cloud is great at a lot of things, but some of these more Legacy applications are very, very complex and extensive applications may have some issues which you can solve with modules, you can solve with great platform engineering, or you can just work around in other ways.

One of these issues is Drupal expects POSIX file system, this essentially means that Drupal and all that’s filing the output calls were designed with the fact that there's a hard drive underneath the Web server, if not a hard drive in there, is an NFS server, there's a Samba server. There's some sort of underlying file system. This is not oppose to some new applications where maybe they're built by default to go store files inside Amazon [Espree 00:04:16] or inside Akamai NetStorage, or inside documented oriented database, like CouchDB or one of those databases.

Drupal has come a long way especially in Drupal 7 in making it so that you can enable modules that will use PHP file streams instead of direct app open … Legacy, Unix file operations, but there's a number of different version of Drupal and they don’t all support this and there's not a lot of great file system options inside the Cloud. At the end of the day Drupal still expects to have that file system there.

A number of other issues are: Drupal may make … you may make five queries on a given page, you may make 50 queries on a given page, and when you're running everything on a single server this is not necessarily a big deal. You may have latency in the hundredth of milliseconds, when you run you're running something on the Cloud it may be the same latency on a single server, but now let's talk about you're running and even with the same availability zone in the Amazon you may have your Web server on one rack and you may have your database on a rack that is a few miles away within the same availability zone.

This latency, even if it's only one millisecond or 10 milliseconds per query it could dramatically add up. One of the key challenges in dealing with Drupal both at the scale of [horizontal 00:05:49] layer as well as just in the Cloud in general, it's how you deal with high latency MYSQL operations. Do you improve the efficiency of the overall page and use less dynamic modules or less … 12-way left joins and views and different modules? Do you implement more cashing? There are a lot of options here but, in general, Drupal still needs to do a lot of work in improving its performance at a database layer.

One other similar-related note is Drupal is not built with partitions tolerance in mind, so Drupal will expect to have a master database that can you can go commit transactions to. It won't have any automatic charging built in so if you move, let's say, the articles on your website, your article section may go down but you'll still have your photo galleries; your other node-driven elements.

Some other new-generation applications may be able to deal with the loss of a single backend database node, maybe they're using a database like a REOC or Cassandra that has grades, partition tolerance built into it, but unfortunately MySQL doesn’t do that unless you're in familiar in charting manually. We can scale out Drupal and scale up Drupal to MySQL layer and we can have availability MySQL, but at the end of the day if you lose your MySQL layer you are going to lose your entire application essentially.

One of the other issues with Drupal hosting is, there's a shortage of talent, there's a shortage of people that have really driven Drupal at a massive scale. There are companies like … the economies of the world who are the Top 50 Internet site that’s powered by Drupal, or there's talent that the WhiteHouse giving back Drupal, but there's still a lack of good, dev ops expertise in terms of selling that … an organization that runs hundreds of Drupal sites. How to go to deploy this either on your internal infrastructure in, let's say, a university IT department, or to go deploy it on a rack space or a traditional datacenter company?

Drupal has its own challenges, and one of those challenges is: how do you find great, either engineering operations, dev ops people to go help you with your Drupal projects?
Now there's a number of ways, and you're all may be aware. of how an application would die in a traditional datacenter. That may be someone tripping over a power cord, it may be you lose your Internet access or one of your actual upstream ISPs or you have DDOS attack.

Many of these also go from the Cloud, but the Cloud also introduces other more … complex scenarios or in a couple of scenarios. You can still have machine loss, Amazon exacerbated this by that machine loss may be even more random and unpredictable, so Amazon may announce that a machine is going to be available on a given day, which is great and probably something that your traditional IT department or infrastructure provider didn’t give you unless they're very good at their jobs.

There's still a chance that at any given moment, Amazon machine may just go down, and may become unavailable, and you really have to introspection into why this happened. The hypervisor layer, all this … the hardware abstraction is not available to you, Amazon shields us, RackSpace Cloud shields us. All these different clouds shield you from knowing what's going on at the machine layer, or there may just be a service outage, so Amazon may lose a datacenter, RackSpace, just this weekend, issued in the Dallas region with its Cloud customer.

You never know when your actual infrastructure and service provider is going to have a hiccup. There may just be network disruption, this could be packet loss, this could be routes being malformed going to different regions different countries, a lot of different ways that the network can go impact your application, and it's not just traffic coming to your website, it's also your website talking with its main [cache 00:10:10] layer, talking with its database layer, all these different things.

One of the key points of Amazon Cloud specifically, is that its file system, if you're using Lasix Box storage, there's been a lot of horror stories out there about EBS outages have taken down Amazon's EC2 systems or anything that’s backed by EBS. In general, it's hard to go have an underlying, like I said before, a POSIX file system at scale, and EBS' instrument technology, but it's still in its infancy. Amazon, although it's focused on reliability and performance for EBS has a lot of work to do to go and improve that, and even people like RackSpace are just now deploying their own EBS-like sub-systems with an open stack.

Your website may fail just from a traffic spike. This traffic may be legitimate traffic, maybe someone talked about your website on a radio program, or TV broadcast, or maybe you get linked from the homepage of TechCrunch or Slashdot, but traffic spike could also be someone that’s initially trying to take down your website. The Cloud doesn’t necessarily makes us any worse other than the fact that you may have little to no control over your network infrastructure, you do not have access to a network engineer who can go to point out exactly we are upstream and all these factors are coming from, and go implement routing changes, or firewall changes to do this, so the Cloud may make it harder for you to go control this.

Your control thing, your ability to go manage, your service in Cloud may go down entirely; this is one of the issues that crops up on Amazon when they have an outage. They may go all the way back so you can do anything. It may go down entirely and you have to be able to engineer around this and ensure that your applications will survive even if you can't go spin up new servers or go adjust resizing and different things like that.

Another way to system failure is that your backups may fail, it's either a network to go and do backups of servers and volumes and all these different things in the Cloud but you have no guarantee even when the API says that a backup is completed, that it's actually done. This may be better than traditional hosting but it's still a lot of progress to be made in engineering to go accommodate this.

In general, everyone wants to have a highly-available and resilient website, there's obviously different levels of SLAs, some people may be happy if the website can sustain an hour of downtime, other organization may feel their website condition is critical and even a blip of a few minutes is just too much because it's actually having financial transactions or just publicity if the website is down.

In general, Drupal specifically should be … your hosting at Drupal should be engineered with high availability and resiliency in mind. To do this you should plan for failure because that’s in the Cloud, just know that any given time a server may die and you can have either the hot standby and process in place to go spin up a new server. This means that you want to make sure that your deployment and your configuration are as automated as possible.

This may be a puppet configuration, it may be CFEngine configuration and may just be the chef or a batch script that says, "This is how I spin up a new machine and install the necessary packages. This is how I check out my Drupal code from GitHub, but at the end of the day when you're woken up by a pager … due to a pager at 2:00 in the morning, you don’t want to have to go think about how you built the server, you want to have a script to go spin it up, or ideally, you want to use tools to go have it scale over automatically; and so you actually have no blips.

Obviously, to have no blips means that you need to have this configured automatically. You should have no single points of failure, that’s the ideal in any engineering organization, any consumer-facing or internal-facing website application have no single point of failures. In a traditional datacenter would mean having dual UPSs, having dual upstream power supplies and network connectivity; having two sets of hard drives in their machine, or having RAID, or having multiple servers, having your application low-distributed across … in regions … geographic regions.

There's lots of single points of failure out there. The Cloud abstracts a lot of this, so in general it's a great idea to run the Cloud because you don’t have to worry about the underlying network infrastructure, you can actually spin a server up in one of the five Amazon East Coast availability zones, and you don’t have to worry about any of the hardware requirements or the power, or any of those things. In order to have no single points of failure, it means you have to have two of everything, or if you due to the downtime have … you can use Amazon's Cloud formation along with CloudWatch to go quickly spin up a server from one of your versions and just boot that up that way, but definitely it's good to have two of everything, at least.

You will want to monitor everything, before I said you could use CloudWatch to go monitor your servers, you can use Nagios installations, you can use Pingdom to make sure that your website is up, but you want everything monitored, so your website itself is returning … Drupal returning the homepage, do you actually want to submit a transaction to go create a new node and validate that this node is there, using companies like Selenium.

Do you want to just make sure that MySQL is running, do you want to see what the CPU help is, or how much network activities there is, and one of the other things is you want to monitor your monitoring system. Maybe you trust that Amazon's CloudWatch isn't going down, maybe trusting Pingdom not to go down, but you probably won't trust the fact that if you're running Nagios and your Nagios server goes down, you can't sustain an outage like that, you don’t want that to happen at 2:00 in the morning and then someone tells you on Monday morning your website has been down all weekend, and a good idea to monitor the monitor servers.

Backing up all your data is key for resiliency and business continuity, and ensuring that your Drupal Cloud system is backed up; your MySQL database is backed up. Your configurations are all there, and this includes not just backing up but validating that your backups are working, because many of us may have been in organization where, yes, the DBA did back up a server but when the primary server failed and someone tried to restore it from the backup someone found out that, oh, well, it's missing one of the databases or one of the sets of cables. Or, maybe the configuration or the password wasn’t actually backed up so there's no way to even log in to that new database server.

It's a very good idea to always go and test all of your backups, and this also includes testing emergency procedures, for organizations have to have business continuity plans, but no plan is flawless and plans have to be continually iterated just like in software. The only way to ensure that the plan works is to go actually engage that plan and test it, so it's all of my recommendation that if you have a failover a datacenter, or you have a way to failover for your website, you will want to test that failover plans.
Maybe you only do it once a year or maybe you do it every Saturday morning at a certain time, if you can engineer out so there's no hiccup for your end users, or may be your website has no traffic at any given point in time of the week, but it's a great idea to actually go test those emergency procedures.

In general, there's challenges with Drupal management, and just the resource challenges. The Cloud tells you that your developers no long have to worry about all the testy details but are necessary to go launch and maintain a website. You don’t have to have any operations staff to be more … invest in Hype. I think a lot of engineers always felt that the operation team is just a bottleneck in their process and once they have validated that their code is good, either versus their opinion or they're running their own system test, or unit test. They wanted to go just push that live and that’s one of the principles of continuous integration.

The reality is that developers aren't necessarily great at managing server configurations, or engineering a way to go deploy software without having any hiccup to the end user client who may load a page and then there's an AJAX call that refreshes in another base so we want to make sure that there's no delay in the process, and that code doesn’t go impact the server, and the server configurations are maintained.

Operations staffs are still very, very likely and you have to go plan for failure to go plan your performer process in reality. It's very hard to go find people that are great at operations as well as understanding an engineer's mindset, and so dev ops is resource challenge.

Here's an example of how we design for failure. Here at Acquia, we plan for failure; we engineer different solutions to different clients' budgets to make sure that we give them something that will make their stakeholders, internally and externally happy. We have multiple availabilities on hosting so for all of our managed Cloud customers when we launch one server we'll then have another backup server in another zone.

Drupal will replicate data from one zone to the other. If there's any service interruption in one zone it will go serve data from the other zone, so this includes the actual Web node layer, or the Apache servers that are serving the raw Drupal files includes the file system. Here we use Cluster effects to go replicate the Drupal file system from server to server and from availability zone to availability zone.

It's also the MySQL layer, we'll have a master database server in its region, or we may have a slave against those master database servers, but it's ensuring that all the data is always in two places and anytime there's a hiccup in one Amazon availability zone it won't impact your Drupal website.

Sometimes that’s not enough. There's been a number of outages recently in the Amazon's history where maybe one availability zone goes down, but due to the control system failure, or due to other issues with the infrastructure there's multiple zones that are impacted. We have the ability to have multiple region-hosting, so this may be out of the East Coast, and the West Coast, U.S. West, and maybe the … our own facilities.

It really depends on what the organization wants, but the multi-region hosting gives businesses the peace of mind and the confidence that if there is a natural disaster that wipes out all of U.S. East, or if there's a colossal failure that’s a cascading failure in the U.S. East, or one of these different regions that your data is always there, your website is always in another region, and you're not going to experience catastrophic delays in bringing your website up-to-date.

During Hurricane Sandy there were a number of organizations that learned this lesson when they had their datacenters in, let's say, Con Edison's facilities in Manhattan and maybe they're in multiple datacenters there, but it's possible for an entire city to go and lose power for, potentially a week, or to have catastrophic damage by water to the equipment. It's always important to have your website available for multiple regions and we offer that for our clients.

One of the other key things … since they are to prevent failure is making sure that you understand the responsibilities and the security model for all the stakeholders in your website. You have the public consumer who is responsible for their browser and them engaging with them and showing they don’t distribute their passwords to unauthorized people.

You have Amazon who is responsible for the network layer for … during that two different machine images on the HyperVisor don’t have the ability to go disrupt each other. Making sure that they are … the physical medium of the servers and the facilities are all locked down and that customers using the security groups can't go from one machine to the other, or have a database called on from one rack to the other for different clients.

Then you have Acquia who is responsible for the software servers to the platform as a service layer with Drupal hosting. We are in charge of the operating system patches, we are in charge of configuring all of the security modules for Apache and in charge of recommending to you that you have Acquia network inside tools that you need to update … you need Drupal modules to ensure a high security, and you do all these things, but that brings it back to you. At the end of the day you're responsible for your application, your developers are the ones that go and make changes too and implement newer architectural things that may need to be security tested, or that choose not to go update a module for one point of view or another.

There's a shared security model here which covers both security availability in compliance, there may be a Federal customer who has to have things enabled a certain way just to go comply with a [FISMA 00:24:23] or Fed ramp accreditation. Obviously security can go impact the overall availability for your website and you don’t engineer for a security up-front them half of them can go take down your machine or they'll compromise your data so you don’t want your website back online until you’ve validated exactly what has changed.

What's very important to understand in the shared security module, and as you're planning for failure. Another thing I had briefly touched before was monitoring. This includes both monitoring your infrastructural application as well as monitoring for the security threats I just mentioned. At Acquia we use a number of different monitoring systems which I'll go in detail in, including Nagios, including your own 24/7, 365, operation step, but we also use third party software to go scan our machines to ensure that they are up-to-date and have no open ports that may be an issue, or have no demons running that are going to be an issue. Or have no other vulnerability.

This includes Rapid7, OSSEC, monitoring the logs, and for thwarting any … lots of issues across issues during security scans. It's important to monitor your infrastructure both from making sure the service is available as well as there's no security holes.
Back to monitoring, we have a very robust monitoring system, it's one of the ways … it's one of the systems we have to have, it's something we have 4,000-plus servers in the Amazon's Cloud, so all the Web servers and database servers and the Git and SVN servers, and all these different types of servers, they are monitored by something we call [Mon 00:26:01] Server, and these, on servers check to makes sure the websites are up, check to make sure that MySQL and Memcache is running, all these different things.

The mon servers also monitor each other, you see that from the line form mon server to mon server at the top, so they monitor each other in the same zone. They may choose to go monitor a mon server in another region, just to ensure that if we lose and entire region we want to get a notice about it.

The mon servers may also be the [height 00:26:32] of Amazon's Cloud, that we may go through rounding from someone like Rackspace, just to have your own business continuity, best-breed monitoring to ensure that if there is a hiccup or service interruption in one of the Amazon regions that we go and catch it. It's important to have external validation of the experience and if we … we may just use something like [Pingdom 00:26:50] in order to go ensure your website is always there.

Ensure that it is operating within the bounds of its SOA, so there's all sorts of ways to do monitoring but it's important to have the assurance that your monitored servers are working and each monitor that goes down has something else alert you that it's down, just so you don’t impact your supporter operations team in trying to recover from an issue.

In pattern high availability resiliency in your monitoring infrastructure is very important. One of the other things; just being able to recover from failure; this includes having database backups; this includes having a file system, snapshots, so you can recover all the Drupal files, making sure that all your EBS volumes are backed up. Pushing those snapshots coming way over to [Espree 00:27:42], making sure that the process is replicated using a distributive file system technology-like luster. With all of this, you can potentially recover from catastrophic data-failure because having backups is important.

You can choose if you want to have these backups live, live replication of MySQL or the file system, or just hourly snapshots, or weekly snapshots, and that depends on your level of risk and how much you want to go spend on these things.

In terms of preventing failover, we utilize a number of these different possibilities, but you can use Amazon Elastic load balancers, multiple servers behind an ELB, and these servers can be distributed across multiple zones. For example, we use ELBs for a client like the MTA of New York, where they wanted to go and ensure that Hurricane Sandy wiped out one of the Amazon availability zones, we can still serve their Drupal website from the other availability zones.

We also used our own load balancers just in our backend to go and distribute traffic between all the different Web nodes, so one of the availability zone may go for request to the other availability zone, where you can do round robin, and that’s a different logic in there to go to distribute the request to all the healthy Web nodes, and to make sure that any unhealthy Web node we cannot sent and travel too, so while our operations team are automating systems to go recover from the reason it's unhealthy.

We have the ability to also use DNS switch to take a database that's catastrophically failed or has other replication labs or something out of service. We always choose, at Acquia, to ensure that all your data transactions are committed. We'd rather have no data loss than incur a minimal service disruption, and so you're potentially losing, usually uploading the file or a user … and account being created or some other … we have people building software service business on top of us, so that loss and protection is very important to us, and so we utilize a DNS switch mechanism to make sure that that database traffic all flows to the other database server.

For the larger sites, multi-region sites, we actually use the manual DNS switch, to switch from region to the other, this prevents a flopping of an issue and having a cache server turned into something even worse, where you may have data written to both regions. The DNS switch allows us and allows our clients to build their Web site over when they choose to and then when everything is status quo again, they can go build back.

As I said before it's very important to test all of your procedures and this includes your failover process. It should be scripted so you can go, failover to your secondary database server, so you shut down one of your Web nodes and have it auto-heal itself. People like Netflix are brilliant about this, where they have their Simian Army as they call it, that they can go shut down RAM and shut down servers, and shut down entire zones and ensure that everything is recovered.

There's a lot of best practices out there in terms of actually testing the failover, and these failover systems and the extra redundancy that you’ve added to the [limiting 00:31:22] or points of failure is key and other non-disaster scenarios. Maybe you were upgrading your version of Drupal or you're rolling out a new module and you need to go add a new database or alter a cable, go through that process within Drupal.

You can failover to one of your given database nodes and then apply the [modular 00:31:44] schema changes to that node without impacting your end users. There's ways you use these systems and in your normal course of business to make sure that you use the available nodes to their full capacity and minimize the impact to your stakeholders.

Jess, do you want to talk about why you would to do everything yourself?

Jess: Sure, yeah. Thank you so much, Andrew. I think that was a really good overview, and hopefully people on the phone were able to take some notes and think about, if you want to try this yourself, what are the best practices that you should be following.

Of course, Acquia Cloud exists and as I'm in marketing I would be remiss not to talk about why you'd want to look at Acquia, but the reasons why our customers had chosen to leave DIY, they are mainly pocketed into these three groups. One is: they don’t have a core competency around hosting let alone high availability, and so if that core competency doesn’t exist it's much easier and much more cost effective to work with a provider who has that as their core competency and can provide the infrastructure resources as well as the support for it.

Another main reason people will come to Acquia is they don’t have the resources or have no desire to have the resources to support their site meeting 24x7 resources available in order to make sure that the site is always up and running optimally, so Acquia is in a unique position to respond to both Drupal application issues as well as the infrastructure issues. We don’t make code changes for our customers but we always are aware of what's going on with your site, and can help you very quickly identify the root cause of an outage and resolve it quickly with you.

Then one of the other reasons is it can be a struggle when you're trying to do this yourself, either hosting on premise and you have purchase servers from someone or if you’ve actually gone straight to Amazon or Rackspace. Oftentimes people have found themselves in between sort of blame game and a lot of finger-pointing if the site goes down, their instinct would be to call the provider and if that provider says, "Hey, it's not us, lights are on, you have service," then you have to turn around and try to talk to your application team, what's wrong, and so there can be a lot of back and forth, a lot to time wasted and what you really is your site up and running.

Those are reasons to not try and do this yourself, of course you're welcome to, but if you try and you haven’t had success, the reasons you're going forward with Acquia is our White Glove service so, again, fully managing on a 24x7 basis for the Drupal application support as well as the infrastructure support, as well as our Drupal expertise, so we have about 90 professionals employed here at Acquia across operations, who are able to scale up and down your application.

We have engineers, we have Drupal support professionals, and they can help you either on the break-fix basis or on an advisory capacity to understand what you should be doing with your Drupal site between the code and configuration to make it run more optimally in the Cloud, so that’s a great piece of what we offer. Of course all of the engines covered today in terms of our high availability offerings and our ability to create full backups and redundancy, across availability zones as well as Amazon Regions.

We are getting to the close here, if you have some questions I'd encourage you to start thinking about them and put them into the Q&A.

The last two slides here just showcase the two options that we have if you would like to look at hosting with Acquia, Dev Cloud is a single server self service instance, so you have a fully-dedicated single server, you manage it yourself and you get access to all of our great tools that allow you to implement continuous integration best practices.

This screen shot you're seeing here is just a quick overview of what our user interface looks like for developers and we have separate dev staging and prod environments pre-established for our customers, very easy-to-use drag and drop tools that allow you to push code files and database across from the different environments while implementing the necessary testing in the background, to make sure that you never have made a change to your codes that could potentially harm your production site.

The other alternative is Managed Cloud, and this is the White Glove service offering where we promise your best day will never become your worst day with someone playing traffic spike that ends up taking your site down. We'll manage your application and your infrastructure for you, our infrastructure is Drupal tuned with all the different aspects that Andrew has talked about. We've used exclusively Open Source technologies as part of what we add to Amazon's resources and we've made all the decisions that need to be made to ensure high availability across all the layers of your stack.

With that, we'll get to questions, and we have one that came in. "Can you run your own flavor of Drupal on Acquia's HA architecture?"

Andrew: The answer is, yes. You can use any version of Drupal and I think we are running Drupal 6, 7 and 8 websites right now. You can install any of Drupal modules you want, we have a list of which HA extensions we support. We support most popular modules out there. There's always been a day, maybe there is some random security module or some media module that need something and we may need to go sell it for you or recommend the alternatives. You can … people have taken Drupal and just … for lack of a better word, and just bastardized it, and just built these kind of crazy applications on top or we've written chunks of it, and then it also works with our HA architecture.

Our expertise is in the Core Drupal, but our professional services and our technical account managers are great at analyzing applications and understanding how to improve them in performance so by now we support pretty much any … the platform can host any PHP application, or static application. It's optimized for Drupal, but the underlying MySQL and file system and Memcache and all these different requirements for Drupal website; they are the AJ capabilities of that works across the board.

Jess: And we do have multiple incidents where customers have come to us, and they’ve got their application running and in our Cloud environment fine, but they came to us from hosting directly with RackSpace or Amazon and they found it to be either unreliable or it just wasn’t cost-effective for them because of the amount of resources that had to be thrown at the custom code.
Another good thing about Acquia is through becoming a customer you can have access to all these tools that help test the quality of your code and configuration, so when you have extensive amounts of custom codes that are brought into our environment we can help you quickly figure out how to tune it and/or if there are big issues that are the culprit for why you would need to constantly increase the amount of resources you're consuming; we can let you know what those issues are and we can do a site audit through [PS 00:38:37] like Andrew mentioned.

Our hope and our promise to our customers is that you're lowering your total cost of ownership if you're giving us the hosting piece of it along with the maintenance, and if there's a situation for any of our customers where we are continually having to assign more resources because of an issue with the quality of your application; that’s when we'll intervene, and suggest, as a cost-savings measure, work with our PS team to do a site audit so we can help you figure out how to make the site better and therefore use less resources.

Andrew: In a lot of cases we can just grow more hardware at a problem to go have that be a Band-Aid, but it's at both our best interest and the best interest from the customer in terms of both their [builds 00:39:17] as well as having an application that will last for many, many more years, to have our team recommend this is what you should not have done. This is how you can best use this module or this other recommendation to go have a more highly-optimized website for the future.

Jess: The question on, "Why did Acquia choose Amazon to standardize the software, Cloud, on?"

Andrew: Acquia has been around for the past four or five years and Amazon was the original Cloud Company, I was at the Amazon Reinvent Conference a couple weeks ago and one of the key agencies there said, "Amazon is number one in the Cloud and there is no number two." We chose Amazon because it was the best horse and the time, and we continue to choose Amazon because it's still the best choice.

Amazon is … their release cycle for new product features and new price change and all these things is accelerating. They continue to invest in new regions and Amazon is still a great choice to go reduce your total cost of ownership by increasing your agility and your velocity to go build new websites and deploy new things, and move things off your traditional IT vendor to the Cloud, and so we are so very, very strong believers in Amazon.

Jess: "Does Acquia have experts in all theirs … as a Drupal architecture across the data base, the OS, caching?" Then the marketing person is going to take a stab at this, where it's a [crosstalk 00:40:50]…

Andrew: We definitely have experts at all different levels. The RBS team may go and we have some Red Hat experts, we have some … a bunch of experts so they can go recommend different options, for people who don’t host with us. Internally we are all gone to based-hosting so that that may be the expertise about operations staff. Database, we know we have operation staff dedicated just to MySQL. We have support contracts with key MySQL either consulting or software companies for any questions that we can't handle.

It's one of the ways that we go scale if you don’t have to go pay at the corner of the world a 10 grand fee for something that we can just go ask them. Caching , we have people that have … help design some of the larger Drupal sites out there and live through them to be under heavy traffic storms, people that they may go contribute after Drupal Core caching modules, be it Mem-cache or regis-caching and all these different capabilities. With [Agar 00:41:56], we don’t have to use Agar internally but we do interact with it and support it, a lot of our big university or government clients may be using Agar in their internal IT department and they may go and choose to use us for maybe some of the flagship sites or for some other purpose. Yes, we do have experience across the board.

Jess: [Ken 00:42:22], unless you have you any questions that you came in straight to you; that looks like the rest of the questions that we have for the day. Hopefully that you found this valuable, you’ve got 15 minutes back in your day, hopefully. You can find good use for that.
Thank you so much, Andrew, for joining us, I really appreciate it and the content was great.

Andrew: Thank you.

Jess: Again, thanks everybody for your time; and the recording will be available within 48 hours if you'd like to take another listen, and you can always reach out directly to Andrew or myself with any further questions. Thank you.

Andrew: Thanks everybody.

University Shares Tips for Migrating Thousands of Sites With One Install Profile [December 5, 2012]

Click to see video transcript

Female: Thanks for joining today. Today’s webinar is University Shares Tips for Migrating Thousands of Sites with One Installation Profile with Tyler Struyk who’s the Drupal Developer at the University of Waterloo and was on the Drupal Implementation Team there.

Tyler Struyk: Today I’m presenting on what I’ve been doing last six months at the University of Waterloo and what they’ve been doing for the last three years.

A little bit about myself. A lot of people know me as iStryker on drupal.org. I’ve been using Drupal since 2007. Most of that time, I’ve been working as a freelancer and then as I said, I’ve been working at University of Waterloo fulltime for the last six months.

We actually have quite a few websites. Our site here … the two things I wanted you to note is the uwaterloo.ca and pilots.uwaterloo.ca. The uwaterloo.ca are actually the number of websites that are live currently and the Pilots are the websites that are currently in migration to be put into production.

A little bit of history. Certainly, the University of Waterloo, what they used to do was they usually use Dreamweaver templates to give a common look and feel for all their websites. Now, this was quite a bit up trouble because if they ever need and make a change to the website, they will have to push the new template up to uwaterloo.ca and then everyone who owned the website would have to pull that new template down and then push it up to their own existing website. This whole process might take sometimes quite a few months for all the changes to be the same across all the websites on campus.

After that, I think three years ago, right after DrupalCon San Francisco, the University of Waterloo started moving to Drupal and what they started doing was creating one of Drupal 6 websites. This was fine, this was great, the only problem is there’s no way to keep track a lot of them. Sometimes if you had a new feature, you could push out to one or two, but once we eclipsed 25 websites, it was very hard to maintain them all.

Here at University of Waterloo, what we actually call our website is the uWaterloo Content Management System. It’s one installed profile. At least in production, all the websites are on one server. It’s used by all Faculties across the campus and they are all running the same features for the most part. Some websites have one or two one of customization modules, but overall, they all have the same modules and features.

Now, here is an example of the environment websites what looks like right now. This is actually our homepage. Our homepage is a little special. It’s running on Drupal. We actually relaunched this website back in August, end of August. It has the same features and modules, et cetera, but it’s a little trimmed down. It doesn’t need as much features as the Content Management System running on for the other websites. As you see, the website is running a different theme. That’s the most main difference is this running a different theme.

Here at University of Waterloo, there’s actually 15 or less. We have eight people doing the Content Migration and Training. We have five actually doing the actual development and pushing on new features and doing the bulks. We actually have one guy pretty much devoted to accessibility. Now, here at University of Waterloo, the government has come down with an accessibility guideline that we have to meet. Every public sector website must be accessible and this is going to be pushed … I think the standards that we have to meet is coming out in 2014. We’re getting a little ahead of ourselves, but yes, every new piece of content that we have have to meet these goals. Then we actually have a system administrator who does all the backend stuff for our website. I’ll get more into detail what he actually does near the end of the presentation.

As I said, there’s eight people working on the Content Migration and Training. It’s actually broken into two groups. We have two fulltime employees and they work on everything like the migrations, save the migration meetings, doing Q&A, training, support and communication. Now, here at University of Waterloo, we are the school that’s known for co-ops. Nine percent of our courses here at University of Waterloo have a co-op component to it and I don’t know if people on the states around the world know what co-ops mean. You might think of it as internships. Every four months, there’s new co-ops to come in and we actually break the six co-ops in two different groups. Four of them are doing Content Migration and we have them … each one of them is doing a separate site at a time and then we have two of them doing the Q&A and drop in labs. I’ll talk more about the drop in labs in a couple of slides.

The process of migration. If you want your website on campus to be migrated to the new Content Management System, you first put a request. Here at University of Waterloo, we’re using the Request Tracker System. I don’t know if you guys know that. That’s what it matter. After that we request in, we set up a review meeting. We go over the requirements needed. It’s pretty much that. You train and also determine if they’re a good candidate to actually move in the system.

Now, sometimes, there’s websites out there that are not good candidates and a good example of that is the athletics. They have quite a bit of customization and they might probably get into our new Content Management System once we roll out more features, but this is probably in a couple years time down the road, just a list of couple of customizations that you have e-commerce components selling tickets. They have custom content types for sports and teams. They keep track of sport scores. They have advertisement, sponsorship and as you see on the page here, they have a custom layout.

Now, after this meeting, we actually create a website for them to start the migration and we created on the pilots.uwaterloo.ca. We actually assigned a co-op student to help them out. Now, this co-op student might do everything for them and might do the whole migration process of covering the content from their old websites to their new websites or they might just do a small portion and that’s the person who owns the actual content and do it themselves. We set the launch dates. For simple websites, the launch date might be two weeks away, but for complicated websites, they might be a couple of months away. Ten days before, we actually launch the site. We do a lot of Q&A and the one tool that I want to mention is we actually run the Wave Accessibility Test. The WAVE is custom two variation into Firefox and it analyzes your page and tells you what’s … if you have any problems, asystic problems with your current website.

Here at University of Waterloo, we have quite a bit training to support the whole Content Management System. Some of these courses are mandatory. If you have want to maintain the content on your site, you have to take the Content Maintainer Course and then there’s a more advanced reason, which is the Site Manager’s Course. The difference between the two Content Maintainer’s Course is basic Content Maintainer so that you can create new contents, review contents, create new drafts, add new images, things like that. Site Manager’s Course is pretty much like changing layout maybe, just more advanced things. The other course we have is webforms. If you actually want a webform on your website, you have to take the Webform Course. The reason why we force to do this is there’s a lot of things you can do with webforms such as credit card information, et cetera. For privacy reasons, this is the reason why we make them take this course.

Then, there’s a variety of additional tools. Twice a week, there’s a drop in lab so it’s ran all-day. If you have let’s say you have questions, you come in and ask. Once a month, we have a developer’s drop in lab so the five people that do the bug and testing, you can actually ask them to them more advanced questions and have one-on-one time with them. We also do quite a few training videos. I’d like to mention that I actually use Camtasia Studio 7 to do the recordings, but I recommend using version 8 to do it. As version 8, you can have a lot more multiple layers, where Camtasia 7, you can only do one or two layers. An example of that is here. I just pulled this off a website. It just gives you basic idea of what it looks like and the tools you have.

Now, there’s other courses we offer like advanced forms. There’s quite a few tools that you can do with advanced forms, more advanced stuff like regular expressions, filtering and there’s other supporting courses like … that you’ll need to take such as writing for the web and writing accessible content.

The other thing the Content Migration Team does is communication and there’s lot of ways they do this. We have multiple mailing list set up to the reach different departments and we also have the web resource website. On this website, we communicate everything, the upcoming courses, new features, common things like what colors you should be using. If you’re a part of … here at University of Waterloo, each faculty has their own colors and you can come to this website and say, “Hey, what color shall I be using for my faculty?” There’s also accessibility tips and we also push out new news such as term reminders such as on your website, take down your co-op students, et cetera, because quite a few people actually forget to do this.

Back in November last year, we actually pushed out version 1.0. Since then, we are at version 1.5.8. The changes within these two are such things as live streaming, a better slideshow integration and there’s quite a few more tools that have changed between that version and this version.

Websites and servers, these two websites we actually use. One is Request Tracker I mentioned and we use this for bug tracking. We also have this in-house agile project management website that we used for people to press new features and I’ll talk about this later on in a couple of slides. We actually have quite a few servers here that we use for development and testing. I’ll actually switch the next slide to give you a better diagram.

For testing, I kind of want to go to each one of these and the reason why we actually use each one for different scenarios. For your Local Dev, we usually have a standard installed profile on there, just plain Drupal or Drupal 7 or maybe in your case Drupal 6. If there’s a new bug, we test to see … it’s like against Drupal to see if it’s actually Drupal that’s broken or it’s actually our installed profile that’s broken. We also do a lot of Local testing locally. If you’re working on something, you’re going to break something huge, but it’s going to cross the server so you can test out little clear.

Now, we have a web server, the Sandbox and the version of the profile on there is actually the trunk version so any new changes that you push gets pushed up there nightly and gets rebuild. If you want to do testing on what’s upcoming, that’s where you do testing on. Now we have Dev, which is the current release that’s on production. It’s on Dev. If you want to test something or … it’s mostly for bugs. If you want bugs, bug testing, this is where you go to test. It’s pretty self-explanatory.

At first, we just only have one server, the separate Sandbox to production, but it’s broken into two different ones. We introduced the profile server. On here we actually … it’s identical to the Dev server, but we use this specifically to test other installed profiles such as Commerce Kickstart and CiviCRM. The reason why we do this is this installed profiles can break a lot of things and we kind of … having two servers, it’s either to troubleshoot what the actual problem. Is it the actual profiles, other extra modules installed on the server or it’s our installed profile that’s the problem.

Then we have two different servers. As I mentioned before, Pilots and Production and these are two servers that we don’t touch for testing. We do have one exception though. On production, we have a test website and the reason why we use this is because on production, there’s a lot of exterior components I guess that you won’t see on the server such as caching. We use various caching pretty instantly here at University of Waterloo and we can’t mimic that while on the other servers so if we need testing, we will actually push in production and test it there. There’s two things moving forward that would probably going to change. On Sandbox, we were actually probably going to switch over to Jenkins and by switching over Jenkins, what will happen is … so pushing changes once a day up there, they rebuild once a day, but we can rebuild the server let’s say every five minutes.

For production, we actually want to start using Vagrant and what Vagrant is good for is … I’m just reading the questions here. I’m going to get that effect. What Vagrant is good for is you can copy what the settings are for production instead of multiple virtual servers so you can actually set up a virtual server for caching and production and then we can do the testing on there. That way, we will not have to touch production at all in the future.

Someone asked are we actually using Aegir? We are not using Aegir at all. For development to push to create websites, there’s too many bugs and we weren’t able to use this efficiently. Then, do we use Drush? Yes, we use Drush extensively and I’ll kind of get more into detail of the Drush later on in a couple of slides when I’ll show you how we actually push and then release out.

Typical bug fix, I’ll just quickly go over this. Someone creates a ticket in RT and then the Implementation Team works it Local again, standard profile, things that are breaks, Sandbox, trunk, testing in trunk and Dev and Profiles for the current release that’s currently in production. Now, if a bug is breaking production well, then we’ll roll a new version right away. If it’s not urgent, then we’ll commit it to trunk and then roll it out in the next release.

Our installed profile is actually pretty simple. It’s actually made up of four … I think it’s five files and so you got the .profile, the .install and the .make file. The .make file, I don’t know if you guys know. The .make file lets you pulls down all the modules, themes, everything from our repository and it kind of looks like this. All you have to remember, if you don’t know what a make file is, it’s just a grocery list. It says grabs or theme pulls it down.

When you’re committing trunk, there’s two things you got or make in commit. There’s two things that you’ve got to do. You have to commit the trunk and it just create a new tagged version. When you’re creating a new tagged version, we use an increment system so whatever the current version, that’s that in production. An example down here, you see 1.24. If that’s in production, then you create increments such as 1.24.1 and et cetera. The other thing I want to point out, when you’re re-rolling a new feature, the one problem we found is that our features actually are not just configuration.

Let me see this. Our features don’t just contain configuration components. We actually utilize the .module file for features as well as we tie in JavaScript and to assess style-sheets and currently, there’s a bug in features. The way you re-roll a new feature out actually strips the style-sheets from the .info file. Every time we commit, we have to make sure if that feature actually has the style-sheets attached to it that we have to reedit.

Creating a new release, as I kind of mentions before, every increments, we kind of increase it so as I saw 1.24.2, we increment this to 1.25. It’s pretty self-explanatory. Sometimes in creating new release, it takes quite a bit of time because sometimes you’re doing a lot of increment decisions so you might be incrementing quite a few times, testing it and employing back down, other times, this is quick fix and you push up. I’ll kind of touch that more later on.

I’ll kind of get to couple of questions here. Someone asked if we’re using Git. No, we’re not using Git right now. We do want to move to it. Currently, we’re using SVN, but yes, in the next month or so, we’re going to move to Git. How many themes are you running in your installed profile? We are currently using one theme minus the homepage, which is using their own theme. How often do you add new features to production sites? Pretty regularly, we add … as for new features, maybe every release, we add it with one or two features.

As for making changes to the current features, we actually make quite a few changes. Between feature releases, you might be updating 10 features. How many features do we currently have? We have quite extensive list of features. I believe … I don’t know the number off the top of my head, but we have over 50 features broken down to different components. Then the other question was set up as a multi site. The answer is yes and I’ll kind of show you an example of that indirectly later on.

Okay, so here’s an example of pushing out a new release. Now, it’s not every time … when you push out a new release, sometimes things run smooth and majority of time, they don’t run smooth. You do all these works. You usually on a Sandbox, you had the trunk version. The trunk version, you take and then you push this to Dev and Profiles for testing. You test this on the sites that are on Dev and Profile. It could be … I think there’s like 20 odd websites on there right now. If these sites don’t break, then you push off to Pilots.

Usually on Pilots is … at any given time, 60 websites so then you’re testing those websites to see and if it breaks anything, everything is running smoothly or not. If something breaks on Pilots, then you might pull it down to Dev or Profile to testing again on them and it pushed up the Pilots, the next release of two Pilots or if something is really huge, you might have to pull it down with … right down to Local and test it there.

How we actually push it up from Dev, Profiles to Pilots then Production, we actually have Drush scripts that we actually build ourselves. It’s actually pretty simple. You run the make file, you pull everything down, you run DB updates so you update the database and then you feature for everything.

As my notes, so when we’re creating new releases, sometimes it create data releases and sometimes we create release candidates. It’s the same thing you do on drupal.org with modules, we might get the same thing.

Someone asked here are we currently running on Drupal 6. We are actually running on Drupal 7. Everything is Drupal 7.

I actually kind of touch this earlier. I forgot that I had a slide on this. The installed profile … here’s all the files that are part of it so .info, .install, .profile, .make and then we actually have this rebuild.sh, which is a script. Every time you pull down the repository, you actually run the rebuild script and delete the entire profile and then runs the .make file and pulls everything back down. Then we actually have a special server and on that, I think back in the slide number one, I think I should jump there. We had a server called wms-aux.uwaterloo.ca and you see there’s only two there.

On this server, we actually have a custom script that we use Drush for it or server, which have a custom drip Drush command that we run that creates a website for us. We can actually specify a lot of different parameters to it so we can specify which installed profile to use, or which server to install that new website on. When we do that, it creates all the files that you need. It generates the database on our database server and then proof, we have a website up and running. The details of the script, I can’t tell you. I don’t really know. That’s our system administrator. He did this. He spent I think quite a few months building the script, perfecting it, tweaking it, et cetera.

I shall answer a couple of questions before I jump to next topic. Actually, the question that are coming in right now, I’m going to leave to the end.

I’ll talk a little bit about the Agile Project Management Website, this is a website that we use to take out a new feature request that people want to see in our Content Management System. I’m not going to … I’m going to quickly go over the Agile process. I could do a full presentation on what Agile is and how to run this successful Agile sprints, system, et cetera. I’ll just touch base on the topics. People make feature request. This feature request that come in, we turn them into user stories and then we do sprints on those user stories. If you make a feature request, you can become a stakeholder of that user story. In our system, we rank.

As a stakeholder, you can be … well, as a person, you can be a stakeholder of multiple issues and you have to rank the issues that are most important to you using a drag and drop module. The module I have that I custom made is actually on drupal.org. It’s called User Priority Ranking. If you also become a stakeholder, you also can … if you don’t want to become a stakeholder, you can follow us and then … this email communication so if anyone creates a new comment on a user story and you will get an update on the current status of it.

Here’s the ranking system. There’s the module. We use a system I’ll come and reiterate. We use this too as a point system. If you’re rank number one, you get 10 points, if you’re rank number two, you get 9 points and so on. Then, based on our points users, this gives us an idea of what people actually wants and then these are the feature request of user stories that we work on first.

Why do we actually use our Agile Management websites? It’s actually the only way to keep yourself sane. At first, it wasn’t … we didn’t actually need to use it that much because we only had a few websites in the Content Management System. Once you actually get to … once your number hits … well at least for us, once our numbers hit over a hundred, we felt that we became a kind of like … I mean it’s only … or the development team became like “maintenance only” so instead of working on new features, we are working on bugs. It just … new features just came to halt and that was a bad idea so by using the Agile process, people can see, people can track the new features so you know when they are going to be rolled out and this helps the development team keep on schedule and keep its sanity.

Now, we’ll talk about the System Administration. I shall look at the questions first right here before I jump to next one. Now, the questions … I’ll leave the questions that came in to the end.

All right, so how we have it set up is that if you go to uwaterloo.ca, that’s one website. If you go to uwaterloo.ca/physics, that’s a different website and things like /environment, /chemistry, these are all separate websites that are running on the same domain.

Now, there’s two ways to control how this works. One is to do some bulk links and the other way is to use Apache. We actually chose Method number two and I’ll kind of go into detail why we did this. The main reason is because symbolic links actually make are re-directive very messy and it’s … yes, I don’t really … yes, this pretty much it.

Apache includes a directory and in this directory, there’s a special configuration file for each website such as you see there for one for environment like applied-health-science. What they do is you come … a pilot come … went over this already. If you go to let’s say uwaterloo.ca/about, that goes back to uwaterloo.ca/. If you go to uwaterloo.ca/environment/about, well, it points, it doesn’t redirect … goes to the environment website and pulls the content from it and from that database. Then you can go to an additional level degree such as ecology, which is under environments and you can kind of see on the slideshow the structure we have there for our files.

Now, here at University of Waterloo, we use three types of servers, Pound, Varnish and Apache. This is a little different from how other people set up. It’s very common to have Apache and Varnish. If you guys don’t know, Varnish is a front in caching so that the Apache server doesn’t get hit too much. If you’re having questions about Varnish, I recommend that you look it up.

We use Pound for a couple of reasons. One of the main reasons we use it for is to transition HTTPS to HTTP and then this way, you can cache stuff. Here at University of Waterloo, what we want to do is actually have all the websites switched over to HTTPS and to do this, this is the only way to do this is to have Pound in front so that stuff doesn’t go straight to Varnish. It stops at Varnish and then serves up … it serves up caches if it can. I kind of went over the details about this already. In fact, Varnish does not handle HTTP access that well and the reason why the strips the headers off.

Now, the other one problem we have is we always … like you see Dreamweaver websites here at University of Waterloo and we … to grab it Dreamweaver templates for these other websites. They actually go to uwaterloo.ca/css and grab the CSS files as well as /images to grab all the image files. Now, before uwaterloo.ca was a Drupal website, before it was, this was no problem and this was easy for the system as a shredder to handle, but since uwaterloo.ca became a Drupal website, what happens is when you hit that link, Drupal wants to take over and do its own thing. We actually use Pound to revert around this process so if you actually hit our system, uwaterloo.ca/css, Pound actually serves up these style-sheets to you instead of actually hitting the Drupal website. This is very important, like I said, there is over 800 websites out there that are currently used in the Dreamweaver templates that are clearly not migrated to the new system.

One of the other problems we have is with redirects. How the current system works is if you’re not in the … our new web Content Management System, you have your own sub domain. When you come into our new system, sometimes, when you migrate over, you have content … you can’t migrate into our new system and what you need to do is have … redirect to the old system. Sometimes, you might list off on your own that never gets migrated to the new site for quite a successful time or the things too that happens is … you see a one to … how do you explain this? You need a one-to-one mapping of the old content to the new content. This might be like a straight … page one to page one or might be some funny name to a good Drupal website name.

How do we actually handle this is that we actually have a single file that Apache looks at and it just goes through and it goes, “Okay, you hit this page, okay, great then you need to grab this page.” It’s just a one-on-one sequential looking file.

The other thing that … when you have these many websites that becomes unmanageable is these things, a .php file that exist and what we actually done was actually we broke this out into multi different components. Now, I’ll show you an example here. The image on top is actually what the settings .php file is for each individual websites. As you see, it’s only eight lines there. What it does, it pulls in other sending from additional files that are usually pretty static. Right there it pulls up some of the files you down below, it holds some … the standard database settings and then it pulls in the other files that have other settings such as … as you kind of see in the screen now.

There’s special settings at .php file for the database, the host, the network and the settings. If you ever change the domain for your database, you only have to change one file and poof, every other website works. The same if you change the IP address, poof, they’re like … you change one file and then it gets pushed out to all the other websites out there

All right, so what’s coming up. Here at University of Waterloo, we’re going to work on Proper High Availability and we actually want to look at Net App Files for storage as well as my HTML, putting out multiple servers to handle this. The other thing we want to look into is Nginx and this might be able to replace the system we have now with the Pound, Varnish and Apache. The one thing holding us back for that is the Nginx module we write or we write module. Sorry, how is this?

How Nginx handles URL rewrites is quite differently different than what Apache does. Apache has quite a standard control of the things you can do, where Nginx has a smaller slim down version and it can’t do as much. Switch over will take some work this two … it’s on like a one-to-one switch over. It’ll take some time. I should actually state like I’m not a system administrator here at University of Waterloo so some of the stuff I’m just mentioning now is a little over my head, but I kind of want to give you an overview of some problems that I actually have. As I said, I mentioned before, we actually want to switch over to HTTPS for all the websites. We’re currently not doing that over now, but in the next few months, we will be switching over through that.

The other thing I want to talk to you is … this is kind of a little off topic, but how we actually handle search here at University of Waterloo, we actually purchase a Search Engine Appliance from Google that … it’s a server by itself, it does exactly everything that Google does, but it just indexes your own websites. We actually have a thousand websites and that indexes everything for us so it had a little search. You can do the same thing like using Apache Solr, which is an open source projects. We just decided to go with the Search Engine Appliance because it did all this out of the box with very little configuration and comparing to cost, I think when we set up the Search Engine Appliance, it cost us like $50,000 for each one. We’re going to purchase probably the second one and it cost $50,000 for … to use that server for two years and we felt that was good, the better way to go instead of hiring/contracting out someone to set up Apache Solr for us.

The other thing too moving into the future, we actually want to segment our content, how it shows up in their search engine. For example, google.com right now, they have different sections so they have a section for videos, they have section for images, like that, but we have different sections and the Google Search Appliance is set up so that we can set this up for custom content for ourselves. We can have a special tab in search for just events or one for news or videos and podcasts. It’s very easy to set up to all these things with little System Administration support. Whereas with the Apache Solr, there’s quite a bit more support you need to set that all up.

Here’s a question here. What does the question say … someone asked how do we interface the GSA with the Drupal websites and one actually asked someone in fact to … my boss here and I’ll get him to answer the question for you and again, in a sec. We’re about almost done here. Just a couple of notes, we actually don’t have any e-commerce integration with our Content Management System. The reason why we don’t or one of the reasons why we don’t is for … here at University of Waterloo, we have strict policy, we don’t want to get sued by people … like handling credit cards and things like that.

We actually have a special server set up just to handle e-commerce and this is the servers behind strict firewalls and et cetera and right now, we don’t have a need to set up e-commerce. If you want to set up e-commerce, then you will have to set up yourself. You can actually set up your own Drupal website, install Drupal e-commerce, but you won’t be part of the Content Management System. We won’t be giving you support for that.

All right, so that’s what I have. That’s all the slides that I want to go through. I can take any questions you have now.

Female: Hi everyone, if you have any questions, please ask them in the Q&A tab.

All right, we have a few coming in. The first one is how to update multiple databases for multi sites?

Tyler: Currently, we actually have only one … I could be wrong. I think we only have one database and yes, we have only database. It’s segmented. Each website is … sorry, we only have one database server … well, we have two servers. We have one that’s backup and one is our primary one. On this database server, every website has its own database. There you go.

Female: Okay, great. Are your set up as a master slave?

Tyler: Yes.

Female: Okay, great. The next one is, for approximately 60 websites, are they all the same domain or are they running different domains?

Tyler: Sorry, the 60 websites? The …

Female: It must have been on your I think slide three is what they were referring that. I’m not sure.

Tyler: Oh yes. Are you referring to the Pilots? The Pilots itself here. If you were referring to the Pilots, the Pilots is running identical to the uwaterloo.ca, the set up so they’re all on the same domain.

Female: Okay, great. How do you open a new site? What is the procedure?

Tyler: Okay, so this is a little of my end. I know we have a custom script that goes in and creates the file system for us, creates the database for us and pulls down all the modules of the file that you need from the .make file. That’s inside the installed profile.

Female: Okay, great. The next one is, how many centers do you have running all your courses and what department do they belong to?

Tyler: How many courses? We actually don’t have any courses online. It’s not part of our Content Management System. It’s just content itself.

Female: Okay, great. The next one is, what Agile Project Management tool are you using?

Tyler: So far … we’re actually … as I mentioned, we are in our own in-house built Agile sites doing to all … to do to handle everything.

Female: Okay, we’re currently using a contextual home group portal system that serves about 250 different differences at our university. Is Drupal well suited to create a similar system?

Tyler: I didn’t quite understand it. Contextual?

Female: We can move on to the next question if that person will clarify.

Tyler: I can just mention that like … the current system we have now … so we have 178 websites moving forward over the next two to three years, you’ll see that the number grow substantially so we have over 1,000 websites here at University of Waterloo and the next three or four years, that number will jump to probably 500 plus. Right now, we think our current system will be able to handle that no problem.

Female: I believe the next question wants you to show the Apache config shot for your Vhost again.

Tyler: Okay, this is the slide. You have to be … I kind of went … this is an overview of everything. To understand all this, you might be more … this site is more geared to a system administrator. You have to look at this right away and go, “Oh, I get that.”

Female: Okay, our last question is, have you looked into Squid cache system, if so, what is your opinion?

Tyler: No, we have not look at all. To tell you the truth, I don’t know what that is.

Female: Okay, great. Thank you Tyler. Thank you for the great presentation and thanks everyone for your questions and attending today. Again, the class …

Tyler Struyk: They talk to my boss and that one question the person had about the search, I can answer that.

Female: Okay, great.

Tyler: For the search, currently, we are sending everyone to uwaterloo.ca/search, which is a custom theme sites that look at down to our Content Management System, but that’s just the sever itself. Once we switched over to version 7 or GSA, then we’ll be able to pull in the contents right into Drupal. For now, we’re just going … sending them to the actual server to display the information to them.

Female: Okay, great. Now, I think that’s it for questions. Again, thank you Tyler and the slides and recording of the webinar, we posted to the acquia.com website in the next 48 hours. Thanks everyone.

Tyler: Thanks everyone.

Bâtir votre Réseau Social d'Entreprise avec Drupal Commons 3 [November 29, 2012]

Calculating the Savings of Moving Your Drupal Site to the Cloud [November 28, 2012]

How Humana is using Drupal to Drive Repeat Visitors with Personalized, Multi-Channel Campaigns [November 21, 2012]

Click to see video transcript

Jenny: Today, we have Jason Yarrington, VP Professional Services from Digital Bungalow; Andy Patrick, VP Analytics from MarketBridge as well as our own John Carione, here from Acquia to present with us today. At this point, I will pass it over to John Carione.

John: Great. Thanks Jenny. Thanks everybody for joining us today. My name is John Carione. Again, I’m Senior Director of Solutions Marketing here at Acquia and I’m very happy to be joined by Andy and Jason who were responsible for the Humana implementation of Drupal and Drupal WEM Solution. At this point, I’m going to talk about for the first 10 minutes or so is really how and why organizations are choosing to build WEM Solutions on Drupal and really why it’s helping redefine the shape of digital marketing on the web. Then we’re going hear a lot more detail about the actual implementation of a strategic microsite of mywell-being.com from Humana and all the great results they’d have from that in 2012.

To kick it off, digital marketing really is a hot topic in the media today amongst industry analyst, industry journalist and it’s really become ... digital marketing has now just become marketing. It’s just like the term mobile applications will soon be talked about simply as applications. The base of straight line one to many marketing with static sites that are really in place to just house information about your products and services, that’s long been over. The web is really a strategic hub of all your customer interactions today and digital marketers definitely need to understand and embrace technology to be successful in their own job and for the organization to be successful and sustain a competitive advantage.

In WEM Solutions built on the Drupal platform really helped bridge that gap between the chief marketing officer and the chief information officer. They need to work together a lot more today in this new paradigm and the challenges faced by each of these organizations are really the two sides of the same coin that’s ultimately trying to solve the same problem for the organization.
Marketing is trying to achieve poor business objectives and KPIs around marketing in different lead generation. I’ll talk a little about their objectives and IT just trying to facilitate and accelerate those results with using different technology platforms but they have a much more slowly, growing budget so they have to be very focused in their approach. For instance, just a couple of example, marketing needs to create a new tablet based microsite to reach a new younger customer segment.

On the flipside, IT then needs to get the mobile application built across IOS, android, or Window’s platforms. If marketing wants to be more personalized and generate more personalized experiences on their site, then I see it’s going to need to understand how to facilitate web-to-web integration with existing analytic suites that they might already own. If marketing wants to manage all their digital assets in a central location to facilitate content sharing and reuse across multiple geographic sites, then IT needs to offer shared service for digital assets. You really can think about these problems as the same side of a coin and ultimately, WEM is bridging that gap.

Ultimately, what matters is we’ve been able to track, and with the analyst track that WEM’s driving real ROIs for digital marketing today. A couple of examples, 73 percent of companies are planning an investment in mobile channels for 2012, 55 percent of consumers felt positively when companies responded to a social media posting or recommendation. From one report, best in class companies were 3.8 times more likely to change content based on visitor behavior and if they were, in that case superior to their peers, they had a 148 percent return on a marketing investment. They’d have 63 percent gross rate and revenues and 13 percent increase in year over year customer profitability so those are the stats from a report that came out this past year [audio gap] in terms of share of wallet profitability and revenue over their peers who aren’t best in class. It really does matter to the bottom line and top line today.

Ultimately, we’re helping build these WEM-like solutions on the Drupal platform and really creating the whole product and ecosystem around it. Here are some of our thoughts abound the best practices for optimizing your content marketing. When I think about all the things, we’ll hear that Humana is doing today with their My Well-being site and that’s pretty well to our best practices and model for content marketing in the enterprise.

Number one, prospects and customers need to find the content that’s relevant for them. It has to be easy and simple. That content needs to have a call to action for them to progress the next step in the funnel, in the lead funnel. It needs to be able to easily share that content with other prospects in sort of one quick way and the content need to assist in driving out bound marketing initiatives. When you’ve done all that, you need to measure it. You need to measure the results and measure the ROI on the content itself so you’d understand that you’re creating the right content for the right audiences and that variable marketing spend is the right spend.

Ultimately, marketing doesn’t necessarily care what platform or groups of technologies are implemented. They want to use the best one, they want to use the ones that are right for their business, but they care about true marketing results so I did want to tie it back to that. We believe there are four key objectives generating new business, building loyalty and customer advocacy and expanding the total available pool of prospects for your business and all the while, they need the right controls in place to determine how they’re meeting their objectives in real time with measurement. These are things we’re constantly doing as digital marketers over and over again ... solutions on Drupal for web experience can facilitate that. Just to drill in and tie those key performance indicators and objectives quickly back to Drupal, first and maybe foremost, is demand generation. The top objectives for Humana’s My Well-Being site is lead generation for new prospects and customers.

Strategically, organizations like Humana are using these WEM solutions on Drupal to drive site traffic, increase customer attention and the time they spent on the site and then increase the conversion rates to a purchase or other metrics that they use around conversation, and ultimately, capture a larger share of wallet for their customer segments.
Drupal allowed larger organizations to accomplish these lead generation goals in a lot of different ways. By generating personalized content things like banner ads, videos, white papers that are targeted based on a customer’s specific profile and understand their actions such as what sites have they visited, what explicit keyword searches are they doing on a particular page, really helped demand be generated. It could also be generated through social networks via technologies like Drupal Commons and Drupal Commons is our distribution for generating communities on the web. Those communities can help connect prospects with other prospects, ultimately, creating better recommendations for your products and services.

Another way is new optimized mobile microsite campaigns can be spun up really quickly in Drupal so you can also be very effective at reaching a new audience or customer segment using advanced responsive design techniques available with mobile applications.
Just a couple examples there on Drupal for lead gen and priming the demand talk. Now if you want to build advocates and loyal brand advocates for your business, brand dilution has been a big problem for large multinational organization so creating a deeper connection to create these advocates can enhance the business, ultimately delivering in a consistent message by leveraging Drupal’s ability to automate language translation is one way to ensure brand integrity in new local markets, on local department sites or local geographic sites, and they also might have unique market requirements where you need to get very specific localized content up to that page. It can be facilitated very easily with Drupal and our partners and also because Drupal is a very modular data driven platform, it allows marketers the freedom to integrate with all the latest tools and technologies that are hot today, things like gamification with Badgeville. We announced that integration a few weeks ago.

You can even push data at the mobile application or to update thing like coupons or promotions when a buyers in the vicinity of one of your bricks and mortar shops or even create a special promotion for in-store shoppers, they don’t browse at your store and then walk out of the store and buy on an online merchant or one of your competitors so we’ve actually talked a lot of customers who’ve asked us to help them with that. Drupal is really a very flexible platform to meet all of these digital marketing used cases and create those bigger connections, greater connections with your loyal base.

The third big one is expanding your footprint so third goal from marketing leader is typically to expand the total prospect base and market for products and services. That can be accomplished in a number of ways including trying to reach the younger tech savvy audience by expanding in a new geography with untapped demand.

Drupal can also be effective to reach audiences by pushing out promotions or campaigns or creating new product launches to existing social properties like Facebook or Google+. Because of the open source development practices, when the new community becomes really hot for digital marketers like it did earlier this year with the rise of Pinterest, open source development allows the community to create a modular and real time. It took only a month for a Pinterest module to be created to allow a bunch of customers to create the ability to pin images on their site immediately.

With proprietary solutions, there’s often a very long drawn out process to proprietize the integration on the road map and execute it with your in-house engineers. On the flipside, we have 22,000 developers worldwide that are constantly prioritizing the problems that need to be solved today. That really isn’t much of a weight at all.

After you’ve achieved these objectives, you need to measure the success. You need to refine the criteria that you’re doing the measuring. You need to optimize the customer experience the next time a prospect hits your site. For measurement, you want to think about tracking how many users watch the particular video on your site all the way through without abandoning it. Perhaps how many users click submit on a particular form and these things can be tracked easily in Drupal via integration with analytic suites that you most likely already have in-house.

The second step is refining your campaigns by segmenting traffic to spot high value web traffic in a particular page or mobile site and then determine different abandonments spots. Maybe the abandonments happening in the shopping cart on your e-commerce site but to be able to refine and pinpoint exactly where those bottom acts are happening.

Finally, you need to make real changes so you can’t just be satisfied with the status quo. You need to change the site by doing testing on your messaging perhaps, leveraging your CRM system to create even more personalized content, tapping in to CRM to understand more about your customer profiles, create more refined segments and then getting that personalized content out to that segment and ultimately, need to iterate on this for best results over time.

Just to finalize before we got into the bulk of the presentation which is the case study, this is where you made an announcement around open WEM a couple of weeks ago. We followed up on that announcement with paper from Forrester that talks about is it time to consider open source for delivering digital experiences online.

If you haven’t read that paper yet, I definitely encourage you to go to openWEM.com and read that. It’s very insightful but here’s our vision of what Drupal is for digital marketers. It is the unified platform for content community and commerce and we believe that in the future, the alternative key proprietary today is taking an open WEM approach to building these digital experiences online and a unified platform for doing WEM social business software, e-commerce along that customer journey, we think is a great place to start. There’s a lot of areas of differentiation for us, we think.

We have open SaaS models so if you’re not satisfied with the way we’re managing your site, you can zip up the file, take the content, the code and you can go elsewhere. If you are using SaaS applications today and you feel a bit locked in because your customer data is controlled by a third party organization, we don’t have any lock in. You can take your site, take your data and go somewhere else if needed. We really offer that flexibility freedom.

I talked about the open source innovation with the Pinterest example. We believe in unified platform for the full spectrum of the customer journey, constant community, commerce is the way to go. Mostly, because it creates better unified customer experiences on the front end for your customers but it also creates operational efficiencies with your development organization so you’re not constantly still pipe to develop to social business community software today, web experience management software tomorrow to third party e-commerce application. You’re developing on a unified platform and that saves time, speeds your development practices and it’s a lot more efficient. We really have heard and to report that a lot of customers preferred best to breed to an all in one stock system for digital experiences. We know you got marketing automation and CRM and other technologies today that are working just fine or there might be another vendor that comes out with something great tomorrow you want to tap into, but we think Drupal is a great hub technology for content community commerce and we want to plug in those other marketing tools. We have free built integrations to all your meeting applications.

Then ultimately, what actually we’ve been doing the last five years is building a very mature cloud models so a lot of our competitors have come out with sort of be one of their cloud approach to WEM and social business, any commerce over the last year or so, but we have very mature technology to create development, testing and production environment move your site in between those stages to be able to do enterprise search, to be able to do SFO optimization, and a host of other thing we have available in our network, that is really the bread and butter as this market moves to the cloud. We think we’re ready to handle all those requirements. Thanks for sticking with that. With that, I’m going to hand it over now to Jason to start taking us through the case study for building a web experience management on Drupal with Humana, so Jason.

Jason: Yes, thanks John. This is Jason Yarrington, the VP Professional Services here at Digital Bungalow where digital marketing and technology firm would focus on designing and developing websites and Andy Patrick is going to share this presentation with me. He’s the VP Analytics at MarketBridge. I’ve been working with MarketBridge and Humana for the last four years on a really great program. I want to walk you through a little bit of background, the approach we took and redesign that we did a year ago, how Drupal integrated with that and some of the component that John just talked about how we’re using the monosite. Andy is going to talk about how we use that open concept to integrate data from site with a lot of different sources and leverage it both in personalizing content and the analytic surface site. I’m excited to be here.

There’s a kind of bigger story here in the site. If you’re not familiar with Humana, Humana is one of the largest Medicare providers and a top health insurance. They offer Medicare advantage plans and prescription drug coverage to more than four and half million members throughout the United States. If you’re not familiar with Medicare yet, in the Medicare Market, private insurance cannot market directly customers until they reach the age of eligibility. MarketBridge and Humana worked together to create a program four years ago to drive grand affinity in people ages 45 to 65. They asked the Digital Bungalow to create a website to be the center of that mobile channel campaign. As I’ve said, the program was a mobile channel marketing campaign with the site at hub. The website this time is called realforme.com and the featured articles by prominent bloggers have been several subject areas of pillars we’ve held in areas of health, self, family and life. The program was extremely successful. At the end of three years, over 370,000 people had signed up for the site. When Andy and his team analyzed the purchasing behavior of site subscribers, they found that they were much more likely to become on state Humana subscriber and a couple of years ago, the program was awarded the best web based customer retention and loyalty campaign by the CMO Council.

The question was now, how do we build on this success? MarketBridge called us just about a year ago and said, “Hey these are all great. Humana is very happy with the campaign. We all want to get together in Maryland to brainstorm ideas about where to go next with this campaign.” We invite everyone involved in the website and program together, MarketBridge headquarters, Humana for full-day kickoff and brainstorm about a site redesign. We knew we’re ready for a couple of key next step. We’re definitely ready for redesign. The site was good but we need the program and move in to new direction and need a brand consistent with the direction. We’ve had a lot more content providers who were doing a lot more onsite, who are integrating a lot more things, and MarketBridge has secured the domain mywell-being.com and Humana is going to roll out a brand of refresh across Humana so it seems like a great opportunity. The realforme.com program has done really well of attracting site subscribers, a driving consideration for Humana and it definitely had an impact. One of the areas, we always struggled with was engagement and the spot that gave us improvement was definitely an area that been worth investing in.

We were getting a lot of people to the site but we won’t getting as high percentage for repeat visitors as we’d like and more so, Andy and his team listed the analysis that showed that repeat site visitors were 66 percent more likely to add an additional Humana policy than one time site visitors, so this was an area. Engagement was an area we really want to focus on. Deeper analysis of repeat visitors showed us that they were definitely distinct group of users. People came back to site repeatedly did not explore the whole site but rather stayed in certain areas of site search subject matters. For instance, someone interested in particular blog or wrote about planning for retirement tended to read more finance related articles and skip over the other three major areas of the site. We have hypothesized that if we can generate more personal experience for each users, the site would be much more engaging and we have more repeat visitors.

To make the site more engaging, we focused on five key areas. We focused on personalization. You want to learn more about the user and serve the content relevant until now. We focused on approved content management. We had a lot of great content but Real for Me has been dealt on a custom CMS and a rate, which we can build on the applied features was not keep in pace as we keep adding more bloggers and more site editors. Mobile, the analytics were showing us the rapidly growing percentage of visitors were using mobile. It’s not really a surprise but we knew we have to address them. Data integration, one of the strengths for the site has been data integration and we want to expand and refine our data integration to include all touch point, email, direct mail, and half line. Of course the analytics, to run a campaign like this, you need to rate analytics. We want to get better data into analysts’ hands and we want to get better tools into their hands. Andy’s team was always coming up with great stuff, was really kind of the backbone to the entire program but the tools in our CMS and the tools we were using can just weren’t keeping pace and it’s a little painful to get up what we wanted sometimes.

Let me walk you through a little bit more in depth of what we did. For personalization; to execute personalization, we started by driving new registrants to an interactive assessment to really serve two purposes. First, to get some information to help customize their content preferences, but also it’s a yes or no right away but this is going to be a personalized experience. It’s kind of like asking the user’s permission to change their experience based on how they engage with the site. Let me show some examples, I knew you’ve also registered on the site with several questions about each of the subject areas, money, health, people and play. We invested a lot of time and thought into the visual cues, the users see about themselves and the preferences because it was a really cool and engaging experience. Users can really get to see the site is going to have content available for the type of person I am and not just be organized to a generic user or generic segment. Anyone just ask a bunch of questions and not let the user get that this is going to be personalized. They actually if you look at those little circles on the top right, those would change in size as they change questions or as they move the sliders so right away the person get this is going to be a different experience and not just content site but lot of content. Throughout the site, the user can get back to this personalization control. There is a persistent reminder on the right side bar of the site. Again, we want to keep it subtle in at front that this is a personalized experience. It’s not just a content site.

The other big change we need to make was in content management to CMF we want with Drupal. The other component for the redesign and relaunch for the project needed this. John covered a lot of what Drupal brings to marketers at beginning this webinar. We needed more than just content management just some quick bullets. We need the sport users for custom user profile. We knew we’re going to need sport mobile blog of their several different content sites. We knew we need to integrate data and feeds. We knew we need social sharing tools or content writing features etc. They all need to be aware of each other and not be an independent blogger.

Digital Bungalow today is Acquia partner. We do almost everything with Drupal but the time we were, so I said this is slightly a bigger story. We have to value what CMS’s The big proprietary CMS’s they’re not promoting themselves as WEM just seemed to be taking us down a very rigid direction and some of the smaller open source CMS’s. They content well and allowed designers to build good sites but there seemed to be something missing and Drupal really showed us what we thought we needed to build this. I think the last point is really what are the real strengths of Drupal and direction CMS is going. We needed social sharing tools, content rating features. We needed features we weren’t even aware, we needed that, and we needed all of this features to be aware of each other, not just independent plugins. We’re going to need social sharing data to feed into content writing data and content writing data feed into user data. All these things need to work together.

Another big change for the site was mobile. I’ve been talking about mobile so much in this last year at this point really kind seems obvious now, but the time we’re having discussions every marketer has. We think we need a mobile out. What do we do about mobile? Our lead interactive developers brought responsive design to our attention. I’m sure a lot of people on the phone are familiar with responsive design by now but simply put it to technique for design and development, which allow us to optimize the display based on the size of the device you’re on. With that, we can manage mobile tablet and top experience off on the same CMS. We don’t treat mobile as a separate project anymore. Before we used to build the site and then we used to think about mobile, now mobile comes up in every single thing we do. There are other advantages too. Remember the mobile channel program. People are engaged to email direct mail and offline but the response of design we know for instance that if you’re reading an email, a click on a link to the site, you’re going to see an optimal and engaging experience whether you’re on a computer or an iPhone. You’re not going to see a page list just small for your iPhone or a mobile page that you’re viewing from a PC. You’re going to see and experience specific to your device.

Going back to the CMS and some of the stuff that we saw in Drupal, we actually did have to create some module to extend what we wanted to make personalization work the way we wanted to work and we’ve actually contributed this back to the community so you can check this out and there some updates coming to it. I’m going to turn this over to Andy. Andy maybe could share with everyone the role of data integration analytic’s plate and upgrades to the program.

Andy: Sure, I’ll be happy so Jason, thanks. Okay, at the core of the system, we have a rather robust analytic data warehouse and recording system that was designed and developed by MarketBridge. The data warehouse captures and integrates data from a wide variety of sources and marketing channels to create a unified 360-degree view in the customer and also gives us a holistic view of all program marketing activity and results. Some examples of the key sources of data that we capture and analyzed are customer demographics and attitudinal segments, digital marketing stem and response data, website activity from the Adobe site catalyst. We also looked at a variety of social media sources including Facebook, Twitter and Youtube, and this analytics engine that we’ve developed and we maintained has been critical in providing the program management team with the timely insights needed to make smart decisions on program strategy. At this point, all key program decisions are supported by empirical evidence and thorough data analysis. I’ll turn it back over to Jason and he’ll actually show us what the websites looks like to the end user.

Jason: I will walk you through a couple of screens. If you take a look if you go to mywell-being.com today and you come to the homepage, this is what the finished product looks like and this is what site looks like to the user. On this screen, what you’re seeing is the default segment so this is a new user who’s come to the site. They haven’t done anything in the site yet. They haven’t filled out the assessment. They haven’t told anything about them. We don’t know anything about them. They’re going to see a featured article and the hero image. This is updated weekly to keep it fresh. It’s sort of the traditional publishing model. The featured blocks in the next row have content from each area of the site. The users presented with pretty even distributional content from different subject areas and this is pretty much how the old site Real for Me use to work, but after you tell us a bit more about yourself, we start showing you more content relevant to your preferences. The example here, this is a user that lands on the segment of male or females under the age of 60. User has shown us they’re predominantly interested and help relate content. We feature web and email content weighted towards the new heath content.

One of the things I’ve mentioned earlier and Andy eluded to when talked about the data integration, is that now we’re taking this beyond what the user told us in their profile. All the content in the site and in fact, in all of our mobile channel campaigns are tagged to a category. Based on the user’s engagement, what they do on the site and what they do throughout the campaign, we start featuring content based on their behavior. Not just what they told us about but what they’re actually doing. You ever noticed when you buy a car, you see other people driving that car everywhere. Once we got into this, we started to see this behavior everywhere. We start to realize that this idea as proud as we were about it wasn’t that new. We see the Strapmedia and retail. If you use the music service Pandora, the channels are set up by you telling the site all you like and then it recommends content for you and from there on, you thumb up and you thumb down songs and skip songs and the music keeps coming to you base on how you’re engaging. If any of you use Netflix streaming a couple of years ago and Netflix used to be very organized based on the genre and actors and so forth but now, the primary items and suggestion for you based on what you’ve been watching and what you’ve been reading. This isn’t creepy anymore this is actually how we expect the web, how we expect the good user experience to work. Let me show you some of the other segments. A user who comes to the site just with the experience a person would see for someone who is male or female age 60, they are in our retiree segment. As the person who’s been going on throughout site, they’ve been predominantly clicking on retirement related content, they’ve been saving retirement related content to their favorites they’ve been sharing retirement related content to other people and now both the web and email content we send them is going to be weighted towards new retirement leisure and finance content. Here’s another segment, in this segment, we’ve got users who are female ages 30 to 60. Their site activity is primarily been clicking on family and social content and again the web and email feature content is weighted towards new family and social content. We hypothesized that would work. Let Andy tell you how that works.

Andy: Sure. Since we integrated Drupal with the website and we launched the new customer experience last year, we’ve seen tremendous improvements in customer engagement on the website, which had previously been identified as a major opportunity for improvement for the program. Across the board, we’re seeing games and website engagement from our member based. Starting with the 36 percent increase in the daily number of visitors to the site, we’re also seeing almost a 50 percent increase in the number of visitors. Not only we’re getting more people to the site but they’re also engaging more deeply once they’re there and that evidence by the 72 percent increase page use and a 74 percent increase in visit relations. Overall, we’re up to a tremendous start and our client is very satisfied and going forward, we now have a much stronger foundation to support further testing and optimization and we fully expect performance to keep growing into the perceivable future and kind of stepping back overall, I think this project has been just a remarkable example of what happens when you combine some very innovative marketing ideas with the best in breed technology and adding a group of very dedicated marketing professionals, the results have been just tremendous. We’re all very excited to take to take and get another lead forward as we look forward to next year and beyond. We’re all very happy and we’re all very excited.

Jason: John that’s our presentation. We turn it back to you for some Q & A about the project.

John: Absolutely. Thanks very much guys. It’s great to see the detail on the implementations. For anyone that wants to add a question you can add it to the Q &A pad now. We got a few coming in. Let me see here. You talked about the ability through the Drupal module at least the question is around the engagement module that you guys helped contribute back. How do you configure your segments and how do you figure out what article to feature for particular users?

Jason: Yes. I think Andy and I both have been answered that. I’ll tell you on the technical side, I saw one of the other Q&A questions about what module to use. We definitely we’re inspired by the modules we saw like the recommender API and someone mentioned the context API. We did end up building a custom module we called the engagement module and now we’ve re-released the web engagement module. We need a little a bit more control over segment because I think most of the modules that we saw we’re focused on again sort of a generic experience as opposed to a segmented experience so that’s probably the biggest enhancement that we did with this. With regards to how we can figure out segments and the decisions we make, I gave you the basic answer and maybe Andy can fill it in. We start simple. We do this with more of our clients now. We explained to start simple, analyze, refine, and then like the analytics strap how you expand that. Andy, do you want to add to that?

Andy: I think you’re exactly right. I think we’ve adapted a test and learn strategy to content for the website and as we introduced new content, we identified new content that we think is right for our member based, refer that content out in front of technical wide sample of our member based and see what pieces of content are most appealing to the various segment. As we do that testing and we collect results and analyze them that just helps us to further identify and understand what type of content are really going to be most appealing and engaging for each of our key customer segments.

John: Great. Another one on having to strategy for content personalization change, how has it changed over time?

Jason: Andy, why don’t you take that?

Andy: I’m sorry the question was how our strategy has for content personalization change over time is that right?

John: That’s right.

Andy: As Jason mentioned, we started very simple. We went out and we identify some segments based on analysis of our existing member based. As our member based continues to grow, and we start to bring in people from different demographic segments, we revisit what those segments are. We understand that profile of our member base is evolving over time so as those distinct segments changed amongst our member base, we can go back in and redefine what the segments are and realign our content according with.

John: Great and a follow up, you extend content personalization beyond the websites at all?

Jason: Yes, we definitely do. I think we brought up a couple of times to the presentation. The innovation goes across all the mobile channel marketing so we focused very heavily in this demo or in this case study today on what we do on the site because we’re excited about that, but we made at point in the old site and the new site to extend the segment information across all our channel.

John: Great. A question about the personalization segmentation. What we talked about is seemed applicable to log in or authenticated users. Are you doing anything from anonymous visitors?

Jason: We’re not doing a lot yet but we can. There’s no reason why we can’t do this for anonymous users. The way this particular site is structured is definitely a drive towards and an incentive to login and register which helped a lot and I think more and more stay users are not only okay with that, they expect that, they expect you want to personalize experience, you have to log in but we definitely can do it with ... we can do something like this for the anonymous users as well.

Andy: Right. To just add on the Jason’s comment, we acquire registrants from a variety of sources and marketing tactics. Over the years, we’ve accumulated some very rich insights on what the profiles of members looked like from those various sources. One of the things that we’re trying to do in the future is based on which source brought in or basically where that registrant came from, we would tailor the experience to profile that we know about the acquisition source. For example, if we know somebody came in to our Facebook pay per click campaign, we would take a look at all the folks that come through that channel in the past, what their profile looks like and then customize the experience to align to their unique needs and interest.

John: Great. You’ve mentioned a couple of different sort of best to breed technologies that are integrated for the full solution but there’s a question specifically around if you’re using campaign management or marketing automation system like Eloqua or Mercado as part of the solution?

Jason: We’re not. We’re not using Eloqua or Mercado but there’s no reason you couldn’t extend it to use it. Like John said, we’ve always looked at the strength that Drupal is being an open platform. Think of the example not relate to Eloqua or Mercado, but we were using one email platform and we run in to some restrictions with regards to what we want to do with email and we’re able to do switch to a different email provider without having to scrap our platforms. I think it’s the same way with Eloqua and Mercado. We’re driving our campaign based of the solution that MarketBridge built. The solution Andy talked about earlier with the data warehouse handles all of the segmentation and everything that we might get out of an Eloqua or Mercado and it’s tailored exactly towards MarketBridge’s approach to campaigns. We’re not using Eloqua or Mercado but you could definitely go that route.

John: Back to engagement, Drupal engagement modules specifically two part question. What’s next for the engagement modules and are you seeking other contributors or sponsorship for further development on those.

Jason: Yes, thanks. Definitely, we’re looking for additional contributors and really thought about seeking sponsors for it. I mean it’s a core part of how...this campaign has been so successful for us and really open up our eyes to how great websites should work that’s it’s core to everything we do with Digital Bungalow and ore to everything MarketBridge does. It really goes into every project we do and we’ve been making internal investments in it. We know Acquia is making investments in other parts of the open WEM landscape. For us, we’re also a team of great Drupal developers here and growing a team of great Drupal developers so we’re sold on the community. It’s important for us to contribute back what we’ve learned and we learned so much from the community. Yes, definitely, we’re looking for other contributors. Again, it’s kind of the pieces and the framework just like all of other Drupal modules are. The Drupal WEM module is kind of pieces of what we’re doing here. The overall campaign, the testing, the refinement and so forth, the real work always takes people to do it.

John: Great. I think we just have one question left to cut off the presentation. What you see as the keys to driving continued increases and site performance in 2013 and beyond?

Andy: Sure. I’ll take a stab at this one and Jason you’ll free to fill in. I mentioned earlier ... we’ve always been very committed to a test and learn strategy in our approach to content placement and strategy and now we have an even better platform to experiment and optimize the customer experience. We found ourselves in a cycle that leads to continuous learning and continuous improvements in that experience. Basically, the way it works is the more engaged the users are, the more data we’re able to capture about their needs and interest and the more data we capture that in turn allows us to further customize their experience on the website and make it more dynamic, more relevant. We feel that as long as we keep coming out with innovative and fresh new content and ways to engage the user, we’re going to continue to see this grow up and success going forward.

Jason: Yes, I don’t have much to add. I was going to say test and refine, test and refine.

John: Okay. There’s one other question. Did you have behaviors built into the module based on roles or other methods of handling a multistep sales or lead gen pipeline?

Jason: Yes. I guess my slides really show you more about the difference between behavioral segment or people’s affinity for different types of content. Our primary focus in the campaign lately has been about engagement so we’re generally trying to drive just make it more engaging and try more affinities to the site. However, we did use the same methodology or same thought of the other way with segment users is how engaged they are. Once we get somebody that is in a particular segment ... we also take a look at how long they’ve been on the site. This is somebody who’s been here like three or four times. We still going to be showing them certain types of content but this is somebody who’s been here a lot. We might start featuring other content. With regards to like ... I think the question asked earlier about where we might go with the modules we’ve been building, I think we’ve been abstracting it more and more so that we can build segmentation either base on content or base on action from the site or just based on how long the person has been the member, how long since they’ve visited because it’s really behavior marketing, behavior based marking and we’re seeing all sorts of platforms pop up for users to do this. I think there’s a reason why those platforms are popping up. This approach has to be taken to make sites more engaging and to move people along towards the particular action.

Andy: Yes and I just highlight on the Acquia sides on acquia.com. We’ve actually built our own integration service between the Drupal based.com website and Mercado so that we can track lead campaign qualifiers, identifiers from the website back into Mercado for our own regenerations funnel. I think what’s really interesting is when you can start taking these different solutions and different steps and start merging sort of what we’re doing with acquia.com with what the demand is doing in customer segments and the solutions will be phased out and can grow overtime. We’re definitely doing that today from our lead tunnel and sales tunnel perspective.

John: Great. Few minutes early, I give folks some time back but I just want to thank Andy and Jason again for taking time out of their day to join us and I want to thank everybody on the call for joining us today before the holiday. Attendees from here in the US will have a Happy Thanksgiving break and thanks again for joining us and we’ll see you on next webinar very soon.

Jason: Thanks.

Andy: Thanks a lot everyone.

Passer de zéro à 100km/h sur Drupal grâce à Acquia [November 15, 2012]

Using Drupal Commons to Accelerate Collaboration [November 14, 2012]

Creating Solid Search Experiences with Drupal [November 13, 2012]

Click to see video transcript

Speaker 1: Hi everyone. Thanks for joining the webinar today. Today’s webinar is Creating Solid Search Experiences with Drupal, with Chris Pliakas, who is the product owner of Acquia Search.

Chris Pliakas: Today’s webinar on creating a search experience with Drupal, I think we’ve done a lot of webinars in the past where we focused on Acquia Search, we focused on some of the basics. Today, I really wanted to focus on just Drupal in general, not having Acquia Search focus. Of course, all these techniques can be used with Acquia Search, but I just wanted to highlight some of the things that are in the community.

Also, based on our experiences hosting over 1,600 indexes, with 1,600 subscriptions, people who are experimenting with search pages and various UX talk about some of the trends that we’re seeing, and in order to get the best experience possible, we wanted to touch on some content strategy items that you can employ to make sure that your search is set up for success.

Then we’ll focus on the search page user interface, so we’ll do a live demo exploring some of the tools that are available to Drupal right now that can be used to create modern-day search user interfaces so that your users get the best experience possible out of the application and could find content that they’re looking for.

Also, we’re going to demo some things that are coming down the pike. I think it’s important to recognize that right now, enterprise search is at a crossroads, and I just want to distinguish for a minute what enterprise search means. When we talk about enterprise search, we’re talking about internal site search, and enterprise doesn’t necessarily mean large corporations. Enterprise simply means that that search is important to your business and important to you, so this isn’t just a big business thing. This is for searches of any size.

But we see some trends that are emerging with external searches, searches like Google, Bing, Yahoo!, that are now going to be expected by users of your internal site search. Trends that are emerging in the search community at large, really, there is going to be an expectation that your search experience matches what’s out there currently. It’s pretty advanced things, so we’ll talk about those trends and we’ll talk about what’s changing in this space specifically.

We talked about search is really evolving. Over the past 10 years or so, which is quite a long period of time, your internal site search really hasn’t been much more than the user entering keywords and then displaying results that are pretty basic. You have a title, you have a snippet, that sort of thing. But really, right now, search is starting to move into a different space where we have to identify what the user is actually looking for and then display relevant results. Relevant results don’t just mean keyword matching, meaning knowing things about your content, knowing things about your user to make some assumptions to present them with relevant results.

As we create more and more content on the Web today, it’s getting harder and harder to sift through that data and display meaningful data. One thing that we’ll start out with is just a simple example.

What I want to start out with is talking about Apple. How many people know Apple? All right, so I see some hands in the webinar. I guess I want to ask “How well do you know Apple?,” so a first question that I want to ask you is, “Is Apple growing?” I’ll let you answer in your heads. It’s not really a good forum for answering in public.

The second question that you should think about is does Apple have money? Then the third question, is Apple multilingual? Does Apple support multiple, does Apple have knowledge of multiple languages? Does Apple speak more than just English? Those are the three questions that I want you to answer in your head.

I’m just going to assume that you guys did a good job and you were able to answer that. Based on those three questions, I think there is no doubt that we’re talking about Apple Martin, who is the daughter of Gwyneth Paltrow and Coldplay lead singer Chris Martin. Apple Martin, like all kids, she is growing. Does she have money? Absolutely. I think her parents are doing pretty well; one is a rockstar; one is an actress. One useful tidbit is that she cannot watch TV in English, so she is getting raised as a multilingual speaker.

Is that the wrong Apple that you were thinking of? I’m assuming that it is. Tech audiences, let me say, Apple, usually think of the company Apple, and the problem really is about context. The first trend that I want to talk about is contextual computing. Right now, we start to see how Apple could mean different things. It could mean the fruit. It could mean the company. It could mean Apple Martin. It could mean Fiona Apple. It could mean a lot of different things.

When I talk about context, I mean the things surrounding it that expose the content for what it is. For example, if we are talking about Apple being a Fortune 5 company or a Fortune 1 company, whatever it is right now, then that context would expose Apple as being a company. If we were on a pop culture website, then it would be more likely that Apple is the daughter of Gwyneth Paltrow, like we mentioned.

Context and how it relates to your content is getting to be really important as we get more and more data. Sites aren’t just displaying one thing now. Sites are starting to display lots of different pieces of content, and we need to start recognizing that simple keyword searches aren’t going to serve our users. We really want our results to be relevant towards what people are actually looking for.

One way that we can do this is by search statistics. Search is a really unique tool in that it is a window to what your users are expecting on your site. By entering keywords and by clicking on various pieces of content, your users are actually telling you what they want from your website, and they are telling you what content they think is relevant.

There are things out there like voting or reputation metrics, but search is really the best tool to be able to extrapolate what people are trying to do with your website.

That also leads into structured data, which is another trend that we’re going to talk about. Structured data is a way to actually denote what type of content you have on your site. Whether, again, we’ll go back to the Apple example, is Apple the organization or Apple that’s something else? These are the three trends that search is really rallying around.

I want to talk about what Drupal is doing right now to address this and some of the things that are going to be coming down the pike within the next six months or so, because it’s important that as you start to build your search experience that you’re starting to recognize some of these trends so that when the Drupal tools emerge, you can make use of them effectively and provide the site search that your users are coming to expect.

Now I’m going to go to the live demo portion of the site just to set the stage here. I have a really basic Drupal install. It’s the standard Drupal blue that you see out of the box, and it has some prepopulated content. It has a couple of events, a couple of blogs. We’ll actually build out some of the search experiences and identify some of the trends that we talked about.

Now that we have the site up, right now, I’m connected to an Apache Solr backend. Again, if you’re connected to Acquia Search or you are connected to Apache Solr, I think there are demos soon that you can install Drupal. You can configure some of the basic modules. You can download, install the modules. We’re going to start with that assumption that that’s the level that we’re at.

If you do need some help or if you are unsure as to how to install modules, how to configure modules, I do recommend that after this webinar, there are some great resources on drupal.org and some great articles that Acquia provides as part of its forums, part of its library that can help ease that transition. But you can still get some value out of this webinar by following along and taking notes of which modules are being used and seeing how you can configure them once they’re installed.

First, what I want to do is I want to just execute a search. It’s the same whether you’re using core search or any other backend. But I’m going to search for DrupalCon, and we’ll start to analyze some of the results to see what the default behavior is that you get out of the box.

The default behavior we’ll see is somewhat useful but not really. But if I entered DrupalCon, it will give me the pieces of content that match that keyword. It will give me a highlighted results snippet, and it will show me a little bit of information in terms of who the user was that posted that content and what date that content was posted. Sometimes, that’s useful. Sometimes, that’s not. But again, this is a basic search interface that you get out of the box.

To be perfectly honest, this isn’t very useful. This isn’t what users expect. If you compare it to Google or Yahoo! or Bing or all the other major players out there, this is weak, and it doesn’t really give users the information that they need to effectively search the content of your site.

The first thing that I want to do is I want to explore something called Facets. And facets are filters that users can apply to help refine the search results, and it also gives some aggregate information such as the count or number of results matching that filter based on the keyword that you entered.

The first module that I want to explore is something called the Facet API module. I’m going to go to the project page here. This is a module that works with core search. It works with Apache Solr search integration. It works with Search API if you’re using that module. It’s a way to configure your search interface regardless of what search backend that you’re using.

If I expand the screenshot here, you’ll see that here are some examples of the types of facets that you can have. You can have facets by content type, by date. There are even some interesting contributed modules out there that allow you to display facets as graphs. You can really control the interface and display things in pretty interesting ways.

I’m just going to scroll down and show some of the things that you can … some of the add-ons that are available that you can make advantage of. Again, we have the graphs that we talked about. We have a slider, so if you have numeric facets, numeric content, you can say, “I want to show data between this range,” tag clouds, and also date facets, which we’ll actually explore and configure.

I’m not going to spend too much time. That’s just an overview to whet your appetite for what’s out there and what’s available in the Drupal community. But I do want to just go and start configuring this so you can see what this looks like and how this works.

The first thing that I want to do is I want to be able to filter this by the content type. I do have two content types here, blog and event, so I want people to say, “Okay, if I’m searching for DrupalCon, I want to filter by the blogs or I want to filter by the events that I want to see,” so that you can get the relevant information for you.

First thing I’m going to go do is configure the Apache Solr Search Integration Module. That’s the one that I’m using. I’m going to go to Apache Solr, going to go to Settings, and I am going to go to Facets. These are the lists of the facets that I have available to me. First thing I’m going to do is configure and enable the content type. I’m going to save this configuration.

Now that facet is saved, I actually have to position it on the page. The default facets are blocks. Blocks in Drupal are small pieces of content that you can position in various regions or various areas on a page. Once you enable a facet, there is a link up top that allows you to go directly to the block configuration page so that you can configure this immediately.

If I click on Blocks and scroll down … it’s actually enabled for me. I’m just going to reset this so that it is where you guys will see it when you start from scratch. But it will start down here in the disabled category. These are all the blocks that are disabled. We look for Facet API, the backend that we’re using, and then content type. This is the facet that we just enabled.

I’m going to position this in the first sidebar. It is recommended that you do position it in the first sidebar, so that will be on the left-hand side. The reason is because that’s where most of the major search engines position their facets, so in order to help people navigate your search page, we use expected patterns. That’s the best place to put it so that they don’t have to hunt around for it.

I’m going to save my block. Now I’m going to go back to my search page. I’m going to search for DrupalCon. Now I have a facet up in the upper left-hand corner that allows me to filter by events or by blogs. If I filter by blogs, it’s reporting that I have two results. If I click that, you’ll see that I do get my results filtered to the blog that I want. That’s pretty basic stuff, but it allows your users to actually target what they’re looking for.

The next thing that I want to discuss, this is very basic, facet configuration. The next thing that I want to discuss is a pattern called progressive disclosure. This is something that you’ll see on Amazon where if you go to Amazon’s search, you’ll see that you’ll be prompted to search for something that you’re interested in, whether it’s one of the products that they have. Then when you search on that product, you’ll be displayed different filters based on the different types of things that are returned. What it prompts a user to do is start out small, like selecting the department that they want to search in, and then based on that department, it will expose different filters or facets that are relevant just to that.

I do want to take a step back and talk about the events. The events that I have on this site have dates that are associated with them, so the date that the event actually starts, whereas the blogs have a different type of date. They have the date that the article was posted.

When you’re searching for events, you don’t really want to know the date that the event was posted. You want to know the time that the event is actually happening, so you’re going to have two different types of date facets, depending on the content that you’re targeting.

Instead of displaying all of that information, all the possible combinations of facets on the left-hand side, we want to only display the facets as we start to navigate down the content types that we’re interested in.

To highlight this, I’m going to go back to the configuration page, and I’m going to go to Apache Solr, and I’m going to go to Settings, configure my facets, and I’m going to scroll down. We’re going to see two types of date facets that I was referring to. One was the post date and one was the event date. I’m going to enable both of these. I’m going to go to my blocks, position them. I’m going to scroll down, and now I see that the new blocks are here and disabled, so I’m going to position them in the sidebar first, like the other. I’m going to make sure that they’re in the correct order that I expect.

I’m going to save these blocks. I’m going to go back to my search page, search for DrupalCon. You see that by default, now I have filter by post date, filter by event date. In order to configure this progressive disclosure pattern, what we’re going to do is leverage something in Facet API called Dependencies. Instead of just explaining, I’m just going to go for it and highlight by example.

When I mouse over the facet, I get a little gear in the upper right-hand corner. If I expand that, I have an option to configure facet dependencies. This is the date that the actual content was posted, so again, it makes more sense for the blog than it does for the event. The first option that I have here is bundles, which are synonymous with Drupal content types. I’m going to say at least one of the selected bundles must be active. I’m going to say I only want to show this for blogs. I’m going to save this and go back to the search page.

Now you see that that date facet is gone. If I click on blog, now it appears. Now filter by post date. Again, I’m only shown, I’m only displayed facets that are relevant to the content type that I’m looking for.

Again, I could do this filter by event date. Again, mouse over the gear. Click Configure facet dependencies, Bundles. At least one of the bundles is active, and I’m going to say Events.

Now I go back, and when I search for DrupalCon, I’m going to start off very small, limited options, kind of guiding your users to select something and refine their results. As I click on blog, we know that we’re in the blog context, so again, context meaning information that is used to determine what type of content you’re viewing. Now that I know that I’m viewing blogs, I see the post date, which is a little more interesting.

Whereas if I click on the events, now I get the filter by event date. I can say, “Show me events that start in August of 2012 or May of 2013.” It’s not going to really target the type of events that are relevant to me.

One thing too, I’m actually going to go back to the blog facets, you see here that for the blogs, we have this drilled down thing that starts … we have a couple of blogs that span a couple of years, and the default facet that’s coming out of the box, you have to actually drill down to 2011. Now I’m going to go in March. Now I’m going to go March 21st. It allows you to drill down by the specific date all the way down to the time. But that’s actually not what users expect when you’re dealing with types of displays that are blogs, that sort of thing.

I’m actually going to go to Google and search for Drupal blogs. If I click on Search Tools, we’ll see anytime they don’t have that type of drill-down. They actually have the ability to refine by a certain range. That’s usually what users expect, and that’s a use case that people commonly ask for that we’ve seen in our support requests.

The next module that I want to explore is called the Date Facets Module. Again, this is available on drupal.org, date_facets. This can be linked to by the Facet API project page. But again, if we look at the screenshot, we’ll see that it provides a nice little display widget that allows you to display your facet in the range selection. We’re going to assume that that module was downloaded.

Click on Modules. Once you download that module, you’re going to install it. I’m using the module Filter Module to provide this nice interface where I can make sense of my modules because anybody that builds Drupal sites know that you can get up to hundreds of modules, so you need to be able to filter them more easily in this Module Administration Page. All I have to do, I already enabled this, but if I select the check box, click Save configuration, that’s all I need to do to install the module.

Once the module is installed, actually, I’ll do this from the search page, again, filter by blog, you have an option with facets to configure the display. If I mouse over the gear and click it, same list of options that allow me to configure the facet dependencies that can configure the facet display.

After I’ve installed that module, I’m going to have a new display widget site. If I expand here, you can see up at the top there is a new date range widget. The type of display in Facet API is called the widget.

If I click on date range, I’m going to click Save and go back to search page, I’m actually going to get an arrow here, which I wanted to highlight purposely. It says the widget does not support the date query we typed. When you’re doing the date range, this is a common error that people report. You have to actually scroll down and select the different query type. This just tells Drupal that we’re not just doing the date filter. We’re doing the actual ranges.

I don’t want to get into the technical aspects of it, but behind the scenes, it actually changes the type of filter that the backend uses, so it’s important that we actually make this distinction.

Now if I save and go back to the search page, now you see that I get filters that are very similar to Google. I can refine things by the past week, which I have nothing, or past month, past year. It looks like I only have stuff within the past year. But it was able to refine that based on the time range of the content that you have, so it really allows people to narrow down the things that are more recent.

Those are a couple of the tips that I wanted to share regarding the fast configuration, but I want to stop and see if there any questions before proceeding. Do you have any questions? All right. We’ll move on from facets.

The next thing that I think is pretty interesting is that instead of having a unified search page which displays all the content across your entire site, sometimes it’s useful to actually have targeted search pages. These are things like, okay, I have a blog section on my site, which we have here. I only want to search across the blogs or I don’t want to make the user actually click on blog to refine the results. This can actually be done in the Apache Solr Search Integration Module, which we’re going to focus on.

I’m going to click Configuration then go to Apache Solr Search. One thing that I’m going to do to simplify this demonstration and something that I think is useful in Drupal in general is Drupal 7 provides this nice little shortcut functionality. You see here I have Apache Solr Search with a little plus sign. I can click this and it will now add this configuration page, a link to this configuration page in the toolbar so that I can navigate to it more quickly as opposed to having to go through the normal path. I’m going to do that for an easier demonstration. If you’re configuring your search pages, you might want to do that as well.

Some of the tabs here, we have one that’s geared towards pages and blocks. I’m just going to select pages and blocks. This is where we can actually manage search pages. I’m just going to go ahead and add a search page and we can see what this will do.

The goal here again is to create a search page that just narrows down your blogs. I’m going to say this is a blog search. I’m going to scroll down. I’m going to make sure that my correct environment is selected. In this case, I’m running Solr locally, but if you’re connected to Acquia Search, you’ll have an environment for Acquia Search. Environment is really named for the backend that you’re connecting to.

Again, in title, search blogs. That’s going to be the title of the page. The path, I’m going to put in search/blogs. The part that’s going to allow me to filter just by blog content is this part at the bottom, custom filter. It’s a little complex in terms of how you do it, but first, I’m going to select that custom filter check box to make sure that I’m using a filter. We’re going to read the description down here. It says, “A comma-separated list of lucene filter queries to apply by default.”

In English, what that means is lucene is a very low-level search engine that Solr is built on, but it’s a syntax that allows you to filter by specific things and do some pretty interesting stuff. The very basic part of lucene syntax is if you want to filter by field, it will be the field name, field and then colon value. We have this use case actually down here in the comments. We see here bundle:blog. Bundle is the actual name of the Solr field, and blog is the name, is the value that’s actually stored in the index.

If you want to see all the fields that are stored in Solr, you can actually click Reports and click on Apache Solr Search Index. These are all the different field names that you have at your disposal. It doesn’t show you the values, but in our case, we know that the bundle will index the machine-readable name as we specified when we created that content type. If I go to structure content types, we see here all the different machine names. Blog is just the machine name, with _blog.

Again, I’m going to match the Solr field to this machine name. I’m going to say bundle is the name of my Solr field, and then blog is the value that we want to filter by. I’m going to save this page. Now, I have a search page that’s dedicated just to blogs. I’m going to click on this. If I say DrupalCon, now we see that it only gives me two results because it’s only filtering by the blogs, not filtering by any of the events.

Sometimes, it is nice to have these targeted searches. For example, if you do have a blog section of your site, it is very nice so that you don’t have to actually set up a separate site for your blog. You can have your blog be a micro-site that is under the same Drupal installation but just has different configurations isolating that content so users can find what they are looking for.

I want to stop there and see if there are any questions on the search pages. No? We’re good? Okay. I’m actually, just to reduce the noise here, I’m going to disable … is there a question?

Speaker 1: Yes. Is there an autocomplete module?

Chris Pliakas: Yes, there is. Let’s see if I can find it. Yes. The module name is aptly named Apache Solr Autocomplete. The project name is Apache Solr_autocomplete. This will provide the type of autocomplete functionality that people are used to.

Now, it is important to note that, and this is one of the trends that we’re going to talk about, that this actually pulls off your index and does keyword matching. But as you have larger sites and more data, then sometimes, keyword matching isn’t necessarily the best option to guide people towards relevant results. There is a trend that’s going to match statistics as well so that you can actually autocomplete based on what people are searching for as opposed to just the keywords which theoretically will guide them towards more relevant results. As I talk about the Apache Solr Statistics stuff that we’re doing, we’ll relate that back to the autocomplete.

Speaker 1: We have a few more questions.

Chris Pliakas: Okay.

Speaker 1: Can that custom search be put in a block?

Chris Pliakas: Can this search be put in a block? Yes, I believe it can. Let me just search for a module. I believe there is a module that does this. I want to see if this is what it does. I might have to get back to you on that one. I believe there is actually a module that does allow you to expose your searches in a block, but I’m not 100 percent sure on that, and so I’ll take that as an action item and post that answer after the webinar is over.

Speaker 1: Okay. Also about the statistics stuff, is that available now?

Chris Pliakas: Yes. There is an Apache Solr Statistics module that does some very basic stuff, but it’s more geared towards administrators. It does things like the keywords, but it does so more or less how many times a search page is viewed, which isn’t really that useful to site builders. But there is a new extension to that module, a new branch, I guess I should say, that is available on the community. I’ll show you where it exists and I’ll give you a bit of timeline about when that is going to get merged back in, but that’s more geared towards site builders and talks about how people are actually using your search.

Speaker 1: We have a few more questions.

Chris Pliakas: All right. I’ll take it …

Speaker 1: All of this work with non-Drupal content if some other system populates parts of the Solr index?

Chris Pliakas: The answer to that is yes. The trick is getting that data into Drupal. There are some example code, which we’ll point to the links after the webinar, that allow for more easily getting content into Drupal. But once you get the content in, you can display facets and that sort of thing.

The display of the search result doesn’t really bias towards what type of content it is. Again, it’s more or less just getting that content into Apache Solr in a way that Drupal can recognize.

Speaker 1: We have one more. Where is the extension to have autocomplete?

Chris Pliakas: Again, that’s the … we’ll do it for Google. If you search “Drupal Apache Solr Autocomplete,” I’m going to venture that it is one of the first results. It’s on drupal.org. The URL is drupal.org/project/apachesolr, all one word, _autocomplete. It’s pretty easy to find on drupal.org and it’s available on this project page.

Okay. I’m just going to clear cache just to make sure that our stuff is gone. I’m actually going to go back to Google here.

If we look at Google, we see that the search results are displayed in a format that’s pretty familiar to us. Let’s go to Yahoo!, or let’s go to Bing. Search for Drupal.

Now pretty interesting, you’ll actually see that the search results are very similar. You have the title. You have the URL. You have the snippet, and you have some additional information about it. Third thing, go to Yahoo!, search for Drupal, and we’ll see that again, different results are returned because they have different algorithms that determine the relevancy, but the display is very, very similar. The reason is because there is actually a lot of standardization that was done in 2011 by Google, by Bing, and by Yahoo! What that is is something called schema.org.

Let’s go back to Google, and we’ll look at the search results. Let’s go to our blog. We see some interesting things here. We see that when we search for our schema.org blog, you scroll down, we see one of these results has an image. This is actually a great way to talk about schema.org in that it provides some structure around your data.

When we build content types and manage fields inside of Drupal, we’re actually just configuring the data model, so that’s the underlying buckets that we put data into, and it doesn’t really have any meaning beyond what we name it. Google doesn’t understand when you create a blog content type that that’s actually blog content. It’s only blog in name only. Or when you create an event content type, it’s only events in name only. That’s almost like Drupal provides you a leg up in that you don’t have to build your database but that you can do it through the UI. But I’ll actually go back to Drupal here, click Structure, click Content Types.

You see here that I have events. If I manage my fields and I added some extra data here, the date, the event date, an address, an image, if I wanted to add another field, what you do is you create your label and then you select the type of field that you want. We see we have date, file, we have text. This is all real basic stuff that again is just really low level and doesn’t actually expose what type of content that is.

Schema.org is the layer that sits upon that which says, okay, this text field is actually an address, or this image is the primary image of this piece of content, or this event date is the actual start date of an event. It will actually go up as well and say, okay, you can say this content type event is actually an event so that it can be recognized by some standard that’s out there that’s agreed upon by the major search engines.

This actually helps your Drupal site by not only when Google and Bing and stuff index your site, it will actually read this metadata, but there is actually some work that’s being done so that it can modify the display of your internal search so that users are presented with a familiar experience.

That’s probably the thing that people will recognize the most, but the module that I want to share with you is called the Rich Snippets module. We’ll actually just install it and see what it gives us out of the box. Again, Rich Snippets, rich_snippets. There is another module that’s similarly named, but it’s important to understand that this one is geared towards your internal site search.

This takes that schema.org metadata and actually will format your results accordingly. I’m just going to install this module and see what it gives us, and then we can break it down a little bit.

Again, I’m going to go to Modules. I’m going to go to Rich Snippets, enable this. I’m going to bring up a page here so that we can see what it looks like before. Again, very blah. Now, when I enable the Rich Snippets module, we go back to my search page. I’m going to refresh the page. Now you see that it displays the results very, very differently.

The goal of this module is to work fairly out of the box. With Solr, you might have to re-index your content. But as you can see, now the results are displayed in a way that’s much more friendly and much more in line with what users expect.

As a nice UI tip, this module is going to emerge as something that’s going to be a staple on sites with search. As you can see, for DrupalCon Portland, DrupalCon Munich, it displays a little image, and it also displays the start date.

Now, for the blogs, it displays who that blog is by and when it was posted. As you can see, based on the context or based on the schema that we’ve assigned to it, the search results are displayed very differently. This is really important when we’re displaying site-wide searches. There are tools in Drupal, such as Views, which people are starting to explore to build their search pages on, but that’s not really geared towards heterogeneous content.

When you have a mix of contents, then it’s really important that you’re able to display that effectively inside your search page. Whereas views, it gets really, really tricky to say, “Okay, for this content type, display it this way. For this content type or this schema, display it another way.”

That’s the first thing that the Rich Snippets module will give us, is a nice display. Now we’ll talk about how to actually say, okay, this is a date, this is the start date, that sort of thing.

There is a module called schema.org. It’s just schemaorg, one word. It’s a very simple module that doesn’t require a lot of configuration, but you effectively download it, install it, and it allows you to effectively tag your fields and your content with the type of schema that denotes what that content actually is.

If you download and install this module, what it does is pretty simple. If I go to my structure, go to my content types, edit my content type, it gives us this new vertical tab that says schema.org settings, and this allows us to actually specify what type of content this is.

If I said, okay, this is a blog, I could start typing, and it would give me the options that are available. All the options are on the schema.org website, and I’m not going to go over them in detail because there is a lot of them. Just to give you an idea of how much there are, you start off with your basic top level stuff like an event, organization, that sort of thing, and then inside of these have various properties that say, okay, for this event, this is the end date, these are the attendees, so a lot of structured information there.

Each one has a lot. Let’s see if I can get the documentation here. Okay. That’s not what I want to show. Full list. Again, this highlights why this is a great tool for this type of search results display because as we scroll down, this is the nested hierarchy of schema.org schema and properties, so you could see there is a ton of them. The module right now supports a subset of them but it’s going to support more.

As we’re building our content, it’s really important that you use this module and explain what your fields are. When I actually create a field, if I click Edit and I scroll down to the edit settings, you see here that I also have schema.org mapping so I can say the property. I could say this is the start date. Then what the Rich Snippets module will do is based on your schema and properties, it will display your content differently.

Because this is start date, if I go back to my DrupalCon settings, then it knows to display the actual start date up here because based on this result being an event, it’s probably what people are going to be interested in, so it gives them some context about the content that’s being returned so that they can see what’s going on without necessarily having to click on the piece of content itself.

I’m going to stop there and take a couple of questions for two minutes, and then we’re going to move on to statistics and then stop for general questions.

Speaker 1: Okay. We have two questions. Can the custom Solr search results page be used in panels? This might be from the last section.

Chris Pliakas: Yes, I believe it can be. The reason why I say that is because the Acquia Commons distribution is making heavy use of panels and is using Apache Solr for its search engine. I say with confidence that yes, it can be used with panels.

Speaker 1: There is one more. Where is the extension to add to Apache Solr autocomplete which allows for statistics to be involved and not just keywords?

Chris Pliakas: That’s one thing that’s not available just yet, but it’s on the road map for the statistics module that we’re going to display next. This is one of those items where I wanted to make people aware of the different trends that are emerging. This is one case where it hasn’t been implemented yet, but it’s going to be implemented. As you start to look forward in your search solutions the next three, six months, look for this as an option.

Speaker 1: We have one more question. Why do we need Acquia Search when everything seems doable from Drupal Search?

Chris Pliakas: Yes, and that’s a great question. The first thing is that Drupal Search won’t scale. The Drupal is built on relational database technology, and relational databases simply won’t scale for full text searching. They’re really geared towards saying, okay, find me all blogs or find me all users, that sort of thing. But when you start to enter keywords into the mix, it will take your entire site down pretty quickly because it will bog your stuff down.

Regarding Acquia Search, you can run Solr locally, and we’ve contributed a lot of these add-ons back to the community. However, the value add that Acquia provides right now is that we have Solr configured in a highly available cluster, so there is a master/slave replication so that if one server goes down that end users can continue to search. We also integrate the tools that allow for file attachment indexing. We also have a security mechanism that we’ve applied on top of Solr.

Solr actually doesn’t have security out of the box, so you can actually do a Google search and find a lot of Solr instances that are unprotected. You could delete that index. You could add content to that index. We’ve added a security on top of Solr that allows you to connect securely and make sure that you and only you have access to your server. Also, we manage it 24x7.

One of the things I do want to talk about going down as we talk about statistics and contextual computing, there are things that we’re experimenting with Acquia Search that will adjust relevancy based on user actions. This will be a set of tools that integrate with Drupal and integrate with various tools that will provide more relevant results to your users beyond just keywords. There is going to be a lot of value and a lot of focus on contextual computing with Acquia Search that’s really going to differentiate it from not only core search but from using Solr locally.

Speaker 1: There’s a few more, but we can get to them at the end.

Chris Pliakas: Yes, sure. What I’m going to do is just wrap up really quickly with the statistics. There is one point that I want to hit home, and I’ll try to stop by 1:55 to save some time for some questions afterwards.

There is an Apache Solr Statistics module. Let me clear out some of these tabs here. I think that’s it. Or maybe it’s Apache Solr Stats. It’s probably Apache Solr Stats. There we go.

There is an Apache Solr Statistics module that you can download, it works for Drupal 6 and Drupal 7, that gives you some information in terms of how many requests there are, what type of things people are searching for, but it’s more geared towards site administrators, not necessarily search page builders. The reason why I say that is because if I go to my search and I search DrupalCon, it’s going to count that as DrupalCon, the keyword being searched.

If I click on events, since the page reloaded and it actually queried Solr again, that statistics module is going to say, okay, DrupalCon was searched again. What this really does is it says show me content where people have to click around to find what they are looking for. It’s not necessarily indicative of what people are actually looking for on the site.

One of the branches that’s being worked on, it’s actually a sandbox project right now that will be merged into, back into the Apache Solr Statistics module by Q1 of next year … I can’t find it here … there is a sandbox that’s an Apache Solr Statistics fork that’s used to experiment with this stuff. That’s what I’m going to be showing you today. The important thing is that it’s more geared towards the search page builders, and it also tracks what people do after they search for something. It allows you to track what we call click-throughs.

If somebody searches for DrupalCon, we can see what pieces of content people are actually selecting, so we can make informed decisions about how to configure our search and how to modify the relevancy.

What I’m going to do is click on modules, search toolkit, and enable Apache Solr Statistics. When I click on Apache Solr Search, now I have a new tab that says statistics.

What I want to do is I want to enable the query log. This captures stuff about what searches are being executed. Also, I want to enable something called the Event Log. In order to enable this, you have to copy a file from the module to your Drupal Group so that it can capture the information as users are clicking on it.

We’re also going to capture user data. By default, that’s off, but you can capture data not only what people are searching for but who is searching for it. Based on your privacy policy, you can enable or disable that setting.

There is also what I’m about to explain, the law of retention policy and backend, by default, logged to the database, but for busier sites, again, there is going to be the availability to send that to different sources.

I’m going to say a configuration, and I’m going to execute another search. If I search for DrupalCon and now click on DrupalCon Portland, if I go to Reports, Apache Solr Index, Statistics, this gives me some interesting things. It gives me the top keywords, so it shows me what the top keywords are that people are actually searching for. Equally as important, top keywords with no results, so you can see what people are searching for and not getting any results for.

If people don’t find the content they’re looking for, they’re going to leave your site, so this is a really important metric. Also top keywords with no click-throughs, so if people are searching for things and they’re getting results but they’re not clicking on anything, then there is probably going to be some modifications to make sure that they’re getting displayed the correct results.

Here, we see the top keywords. We also have click-through reports. If I click on that, it will show me the pieces of content that people are selecting in the count. As you start to gain some more traffic on your site, this will give you some transparency in terms of what people are doing on your search page, and more importantly, what they’re doing after.

As we talked about the contextual computing, it’s really important that you monitor what people are looking for, and this is a great way to do it. Again, it’s what people are looking for in your site and what they are selecting, what they find relevant. The search page is a great tool to help you modify your experience and tailor it to your users.

We have a couple of slides to end up, but that’s really what I wanted to highlight, is that contextual computing is more the trend, that there are some tools that you can employ now that are going to be improved upon in the future to make sure that Drupal is the best solution available in search to serve relevant content to your users. Search is really becoming a big data problem, and search is also becoming a solution to that problem.

Big data is capturing a lot of information and then making sense of it, doing something with it. As your sites begin to amass a lot of data, search is a great tool to help your users sift through that data and find the relevant content that they’re looking for, and that’s really where the trend of computing is going over the next five years, so definitely pay attention to search as a tool to help make sure your site is keeping up with the latest trends and desires of your end users as they look for engaging experiences.

I went over but we’ll take some more questions.

Speaker 1: Okay, great. Would you recommend using these modules on a Drupal 6 site using domain access?

Chris Pliakas: Domain access is a little bit tricky especially with search. Some of these things are … let’s take a step back. The way a domain access works is that it builds upon the Drupal node access system, so that adds some challenges in terms of search. Not only does a search solution have to be domain-access-aware, but everything around your site has to be domain-access-aware.

Theoretically, you can use your Drupal 6 site with domain access. It’s just that it gets a little bit tricky because your index is logically separated as opposed to physically separated, so there always is the chance of your content either lagging behind in terms of getting that access information or accidentally getting exposed to other sites when it shouldn’t be, so it can be done, but there has to be a lot of thought and a lot of careful planning to make sure that it’s implemented properly.

Speaker 1: The next question is does the schema.org also expose the extra info to search engine spiders?

Chris Pliakas: Yes, it does. That’s actually what the module is geared towards. It’s geared towards the external use case, and it works, it provides that metadata that Google will pick up the images and the additional metadata. But what the Rich Snippets module does is it takes that information and uses it inwards. By default, it actually is geared more outwards, but the work that’s being done right now is taking that and also apply it to your site search, so it’s a win-win.

Speaker 1: The next question is what if the non-Drupal contents are dynamic pages, how do you import those contexts? If not, is there a federated search solution?

Chris Pliakas: I think it’s important to first say a federated search solution might not be exactly what’s being asked for. When we think offederated search solution, we think of things like Kayak or other engines that actually query out different data sources and compile them together.
There are tools in Drupal that allow you to query different sources simultaneously. However, that’s probably not what you’re looking for. You’re probably looking for a unified search solution that displays results instantly.

In order to do that, you can leverage tools such as crawlers, such as Nutch, which will integrate with Solr. The key is again getting that data into a format that Drupal can recognize. But the trick is using those tools to crawl or expose your external data to get them into Drupal.

There are also ways that you can programmatically connect a third party data store and index that into Drupal using the APIs. But again, it’s more of a developer task and something that has to be coded.

But with Acquia Search, definitely look for an offering sooner rather than later to index external content and bring it into your Acquia Search Index.

Speaker 1: All right. We’ll take one more. How can you make information more important based on the statistics? What ways to set this up are available?

Chris Pliakas: Can you repeat that question one more time? Sorry.

Speaker 1: How can you make information more important based on the statistics? What ways to set this up are available?

Chris Pliakas: Sure. I’ll give one example from Acquia.com. We have an offering called Dev Desktop, which is a local stack installer for Drupal. A long time ago, it used to be called DAMP, Drupal, Apache, MySQL, that sort of thing. What we actually have noticed is that based on our statistics, people still search for DAMP more than they do Dev Desktop. We noticed that trend, and the way that we modified our search results was to take advantage of some of the things that Apache Solr has, and when people search for DAMP, we add a synonym to Dev Desktop so that when they search for DAMP, they’re actually getting the content that’s relevant to Dev Desktop, which is what the products mean now.

This is what Google does. This is why Google results are very relevant. They have hundreds of full-time engineers analyzing their search and doing things like saying, “Okay. If you search for a FedEx tracking number, we’re going to show you the FedEx webpage.” Now it’s automated, but that was used by analyzing the statistics, and those are the types of techniques that you can employ on your site based on what your users are actually looking to do.

Speaker 1: Okay, great. I think we’re going to have to end it here. Thank you, Chris, for the great presentation, and thank you everyone for participating and asking all these wonderful questions. Again, the slides and recording of the webinar will be posted to the Acquia.com website in the next 48 hours. Thank you.

Pages