Using the AWS Ruby SDK to Put a Drupal Site's Assets Behind a CDN

As a demo for my local Drupal meetup, I offered to show how to use an origin pull CDN to serve the static assets of a Drupal 7 site installed on Acquia Cloud. If you are not familiar with basic CDN concepts and terminology, start by reading this article on key properties of a CDN by Wim Leers. The value of using a CDN for a site's static assets is primarily a faster total page load for the end-user since once the main page HTML is loaded, the requests for images, CSS, JS, and other assets needed to fully render the page happen faster because there is a lower latency to the CDN edge server.

Although Acquia Cloud customers use a variety of CDNs, CloudFront by Amazon Web Services (AWS) is an affordable CDN that supports both pull and push modes which make it a great fit for serving Drupal's static files and other content. Since I already had easy access to an AWS development account, I picked CloudFront as the CDN for the demo.

Most people who need to set up CloudFront for a single site will probably want to use the AWS Console as illustrated in the CloudFront Getting Started documentation.

However, I wanted to understand how to set this up in a scripted fashion, and the AWS ruby SDK is a tool I've been using recently with EC2. So, this blog post is going to end up being as much of a tutorial on using the CloudFront API via the ruby SDK as on anything else.

To get started, I signed up for an Acquia Cloud free site, and then installed Acquia Drupal 7 and added my ssh key so I could manage the code:

I did a git clone of the codebase and added the latest -dev versions of the Devel and CDN modules. I enabled Devel generate and CDN, and then generated some dummy content.

The AWS ruby SDK seems to be the best option if you don't want to manipulate raw XML, so make sure you have it and some current version of ruby (use rvm if not).

Install the latest 'aws-sdk' gem and make sure you have AWS credentials you are ready to start. I ran all the needed steps in irb.

$ irb
2.1.1 :001 > require 'aws-sdk'
=> true
2.1.1 :002 > AWS::VERSION
=> "1.36.1"
2.1.1 :003 > AWS.config(:access_key_id => 'BTICJ5JI43N2UCMJ3IMF',
                                    :secret_access_key => 'UOP87B7dCYqiYCGT24KZnotArEALkeyEalBMiT3fa7ai')
=> <AWS::Core::Configuration>
2.1.1 :004 > cf = AWS::CloudFront.new
=> <AWS::CloudFront>

See the top-level ruby SDK CloudFront docs.

So, so far it was easy. Looking at the CloudFront client docs was the next step. After experimenting to see if the SDK filled in any defaults, it became apparent that one needs to just build up piece-by-piece a Hash with every required element. Again, this is the sort of work needed for scripting, but is managed in the UI if you are using the AWS Console.

In terms of our goal of serving static assets from a Drupal site, we have to define our site as the origin server. In the data hash we define a single origin with the site domain name (I'm using "cdnmeetupdemo.example.com" in examples here). We give this origin an ID, and reference the origin's ID as the one to use in the default cache behavior entry. Here's an example of a Hash that seemed to be the minimum acceptable data for the API call (I've reformatted the big hashes so they are readable):

2.1.1 :030 > reference_string = Time.now.to_i.to_s
=> "1394678174"

2.1.1 :031 > dc = {:caller_reference=>reference_string,
:aliases=>{:quantity=>0},
:default_root_object=>"",
:price_class=>"PriceClass_100",
:enabled=>true,
:logging=>
  {:enabled=>false, :include_cookies=>false, :bucket=>"nop", :prefix=>"nop"},
:comment=>"demo distribution",
:origins=>
  {:quantity=>1,
   :items=>
    [{:id=>"dev",
      :domain_name=>"cdnmeetupdemo.example.com",
      :custom_origin_config=>
       {:http_port=>80,
        :https_port=>443,
        :origin_protocol_policy=>"http-only"}}]},
:default_cache_behavior=>
  {:target_origin_id=>"dev",
   :forwarded_values=>{:query_string=>true, :cookies=>{:forward=>"none"}},
   :trusted_signers=>{:items=>[], :enabled=>false, :quantity=>0},
   :viewer_protocol_policy=>"allow-all",
   :min_ttl=>3600,
  },
:viewer_certificate=>{:cloud_front_default_certificate=>true},
:cache_behaviors=>{:quantity=>0}}

2.1.1 :032 >   resp = cf.client.create_distribution(:distribution_config => dc).data
=>
{:id=>"E2WPA9OXGCG31D",
:status=>"InProgress",
:last_modified_time=>2014-03-13 02:38:41 UTC,
:in_progress_invalidation_batches=>0,
:domain_name=>"d11l5pg90vjhev.cloudfront.net",
:active_trusted_signers=>{:items=>[], :enabled=>false, :quantity=>0},
:distribution_config=>
  {:caller_reference=>"1394678174",
   :aliases=>{:items=>[], :quantity=>0},
   :default_root_object=>nil,
   :origins=>
    {:items=>
      [{:id=>"dev",
        :domain_name=>"cdnmeetupdemo.example.com",
        :custom_origin_config=>
         {:http_port=>80,
          :https_port=>443,
          :origin_protocol_policy=>"http-only"}}],
     :quantity=>1},
   :default_cache_behavior=>
    {:target_origin_id=>"dev",
     :forwarded_values=>{:query_string=>true, :cookies=>{:forward=>"none"}},
     :trusted_signers=>{:items=>[], :enabled=>false, :quantity=>0},
     :viewer_protocol_policy=>"allow-all",
     :min_ttl=>3600,
     :allowed_methods=>{:items=>["GET", "HEAD"], :quantity=>2},
     :smooth_streaming=>false},
   :cache_behaviors=>{:items=>[], :quantity=>0},
   :custom_error_responses=>{:items=>[], :quantity=>0},
   :comment=>"demo distribution",
   :logging=>
    {:enabled=>false, :include_cookies=>false, :bucket=>nil, :prefix=>nil},
   :price_class=>"PriceClass_100",
   :enabled=>true,
   :viewer_certificate=>{:cloud_front_default_certificate=>true},
   :restrictions=>
    {:geo_restriction=>{:items=>[], :restriction_type=>"none", :quantity=>0}}},
:request_id=>"967e4b4e-aa58-11e3-9a9d-9b9f4c7850ae",
:location=>
  "https://cloudfront.amazonaws.com/2014-01-31/distribution/E2WPA9OXGCG31D",
:etag=>"E121KPOFVHLWAN"}

Looking at the input data, you can see I picked only a single default caching behavior. It has a 1 hour (3600 second) TTL, and it passes query strings but not cookies to the Drupal site. This make is a reasonable start for serving static assets (images, CSS, etc), but it would not support logged in users.

The meaning of :price_class isn't explained in the SDK docs, so look at the pricing chart and you'll see that PriceClass_100 is the cheapest and just has edge locations in the U.S. and E.U. Some of the other settings like the :origin_protocol_policy don't seem to be documented that I can find. The allowed values of "http-only" or "match-viewer" leads me to guess that the "http-only" setting will make a http request to the origin, even if the CDN receives a https request for the asset. For this demo, I don't have a SSL certificate for the site, so that seems like the right option. Most elements of the distribution configuration are explained in the CloudFront API documentation.

The one key piece of information we need from the result to configure the Drupal site is the CDN domain name: :domain_name=>"d11l5pg90vjhev.cloudfront.net".

As an aside, different CDN providers use different tricks to map your location to an appropriate set of edge servers, and for CloudFront you see the response back from looking up the CDN domain name is multiple IP addresses which will be used round-robin by a browser:

$ host d11l5pg90vjhev.cloudfront.net
d11l5pg90vjhev.cloudfront.net has address 54.230.205.168
d11l5pg90vjhev.cloudfront.net has address 54.230.205.117
d11l5pg90vjhev.cloudfront.net has address 54.230.204.247
...

If I run the same from a remote server, I get a different result, since a different set of edge servers are considered to be "closer".

# host d11l5pg90vjhev.cloudfront.net
d11l5pg90vjhev.cloudfront.net has address 54.230.16.140
d11l5pg90vjhev.cloudfront.net has address 54.230.16.179
d11l5pg90vjhev.cloudfront.net has address 54.230.16.185
...

So I went to the CDN module configuration section General tab and put it into Testing mode. Then I went to the Details tab, and while the default of origin pull was right, I was stuck here until taking the advice printed in the message to install and enable the Advanced Help module.

The Advanced Help provides popups (see orange boxes below) that made it clear that to configure all files to be served via one CDN, I just need to enter the CDN url into the box. In this case, I entered http:// plus the CDN host name from above.

Going to the home page (still in Testing mode) as administrator I can inspect the teaser images and see that that they are now being served from the CDN. Success!

Before enabling the CDN, but with Drupal page caching enabled, I tested the total page load time as an anonymous user and saw about 350-500 ms on a full refresh. This required 20 http requests total, and the static files, but not the main HTML are being cached by Varnish. Going back and setting the CDN module mode to Enabled means that anonymous users also get the images, CSS, etc. from the CDN. Trying again, a full page refresh was much faster - in the range of 200-300 ms. In both cases loading the main HTML content took 100 - 150 ms. Meaningfully benchmarking this from different endpoints would be a much bigger task, but with this simple setup, I was able to significantly improve the page load time for a stock Drupal 7 site by using the CloudFront CDN for static files.

You can see in the network tab of the browser's developer tools that the static files are served from cloudfront.net

Inspecting the response headers from one of the images using curl or the developer tools in the browser also lets me see that the image is cached by CloudFront:

HTTP/1.1 200 OK
Content-Type: image/jpeg
Content-Length: 1951
Connection: keep-alive
Server: nginx
Last-Modified: Thu, 13 Mar 2014 17:02:56 GMT
Cache-Control: max-age=1209600
Expires: Thu, 27 Mar 2014 21:45:12 GMT
X-AH-Environment: dev
Accept-Ranges: bytes
Date: Thu, 13 Mar 2014 21:45:12 GMT
X-Varnish: 550805980
Age: 4
X-Cache: Hit from cloudfront
Via: 1.1 b64f1dc57c19be31a8989da60e442079.cloudfront.net (CloudFront)
X-Amz-Cf-Id: aZsTan2oUCJIXawH3dhD-ENlZqyQDroJpyR2tSmjSBnravGyBT95aA==

Finally, to clean up. This part goes back to a SDK tutorial, so quit here if you're not interested. You need to first disable, and then later delete the distribution. This turned out to be harder than creating the distribution in the first place. The reason being that the update call requires a full, valid distribution configuration object (there is no shortcut call to toggle the enabled state). If you look at the response to the initial create call, you can see that there is actually more data than listed in the SDK docs, and also that empty strings came back as nil.

If you try to use the initial distribution configuration with :enabled set to false, the API rejects the call due to the other data elements being missing (like the allowed methods, custom error responses, and restrictions).

2.1.1 :116 > dc[:enabled] = false
=> false

2.1.1 :117 > cf.client.update_distribution(:id=>resp[:id], :if_match=>resp[:etag], :distribution_config => dc)
AWS::CloudFront::Errors::InvalidArgument: The parameter Allowed method settings is required.

So, the dc Hash was sufficient to create a distribution, but not to update it.

On the other hand, If you take the response back from the API (which seems to have all the needed data) and try to use it to make an update after toggling enabled to false, it fails due to the nil values not being strings.

2.1.1 :118 > resp[:distribution_config][:enabled] = false
=> false

2.1.1 :119 > cf.client.update_distribution(:id=>resp[:id], :if_match=>resp[:etag], :distribution_config => resp[:distribution_config])
ArgumentError: expected string value for option :default_root_object

My solution (with apologies to anyone who is actually a ruby expert) was to define a recursive function to walk through the data and convert nil to empty string.:

def recursive_nil_stringify!(item)
   if item.is_a?(Hash)
     indexes = item.keys
   elsif item.is_a?(Array)
     indexes = 0...item.size
   elsif item.nil?
     return ''
   else
     return item
   end

   indexes.each { |key|  item[key] = recursive_nil_stringify!(item[key]) }
   item
end

Now I was able to update the distribution configuration with :enabled set to false, and after waiting a few minutes for that change to deploy, I was able to delete it. The results below are truncated for readability. Note that after updating the distribution configuration, the :etag value changes and you have to use that new value in the final deletion call after the distribution reaches the "Deployed" status again.

2.1.1 :134 > recursive_nil_stringify!(resp[:distribution_config])
=> { ... }

2.1.1 :135 > cf.client.update_distribution(:id=>resp[:id], :if_match=>resp[:etag], :distribution_config => resp[:distribution_config])
=> {:id=>"E2WPA9OXGCG31D", :status=>"InProgress", ...,  :etag=>"E2PNQ1Z5X5G9P0"}  }

2.1.1 :136 > resp = cf.client.get_distribution(:id=>"E2WPA9OXGCG31D").data
=> {:id=>"E2WPA9OXGCG31D", :status=>"Deployed", ... , :etag=>"E2PNQ1Z5X5G9P0"}

2.1.1 :137 > cf.client.delete_distribution(:id => resp[:id], :if_match => resp[:etag])
=> {:request_id=>"3fc8ea8b-aaf8-11e3-bf83-6b90d5cf46e4"}

CloudFront continues to add new capabilities to that will make it attractive to Drupal site owners, and I'm looking forward to using it more in the future.

Comments

Posted on by Alberto Mota (not verified).

Great post Peter!

This was one of the things we weren't sure about when moving to Acquia Cloud.

We've used (and frequently recommend) CDNs such as the Amazon Cloudfront one in quite a few deployments and it's really great how easily you can enable the CDN using the CDN module.

We've been wondering how that would work with AC, good to know that you guys have that handled.

Cheers,
Alberto

www.aglobalway.com

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Filtered HTML

  • Use [acphone_sales], [acphone_sales_text], [acphone_support], [acphone_international], [acphone_devcloud], [acphone_extra1] and [acphone_extra2] as placeholders for Acquia phone numbers. Add class "acquia-phones-link" to wrapper element to make number a link.
  • To post pieces of code, surround them with <code>...</code> tags. For PHP code, you can use <?php ... ?>, which will also colour it based on syntax.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <h4> <h5> <h2> <img>
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.