5+ CDN Optimizations Worth Every Business’s Consideration
February 5, 2016 | Robert GibbThis is the fourth post in a series that covers all 6 steps of the CDN Framework – a guide designed to help you acquire and maintain the best CDN solution possible. There are many different ways to optimize your CDN. But the only optimizations you should make are ones with a clear purpose and measurable outcome. By identifying a purpose and unit of measure, you’ll be able to tell if the optimization is, in fact, an optimization. This approach to optimizing your CDN will help you answer two key questions:
- Did the optimization improve performance?
- How much did it improve performance by?
CDN Optimization: Not the Same for EveryoneThere is a brilliant paper that shows how Canada’s largest news site tuned its CDN. In this paper, the authors document what CDN optimizations were made to improve performance and uptime on the news site. Because the site receives one million unique daily visitors and uses Akamai, the optimizations the authors made may not be the same ones you make. But, as the authors state in the paper’s abstract: “The lessons described are generally applicable to any infrastructure fronted by a CDN.” The authors also accurately state that “CDN vendors do not provide a lot of guidance [in terms of optimization], primarily because the answers to which parameters to modify and what values to set vary greatly based on the customer’s workload, requirements, and business context.” Because of this, a good deal of research and experimentation is required. You need to deeply understand the unique configuration of your origin, then have your CDN complement it. While not every business will have the same CDN optimizations in place, there are a few optimizations considered common and useful.
Optimizations Worth Everyone’s ConsiderationMany of these optimizations were implemented by the authors of the paper mentioned above. Others are widely considered best practice in terms of caching, performance, and cost-effectiveness.
Optimization #1: Setting a Default TTLSetting a default time to live (TTL) for cached objects varies greatly depending on your application type. A million-users-per-day news site that’s frequently updated may have a default time to live (TTL) of two minutes. On the other hand, a marketing website with general product information that stays relatively the same on a daily basis may have a default TTL of one week. Compare Canada’s largest news site CBC with our website (that we use largely for marketing purposes). The default TTL of these websites varies drastically for good reason. CBC’s website administrators set a default TTL of 196 seconds on their homepage. We set a default TTL of 604800 seconds (one week) on our homepage. The TTL is set on each website with the
Cache-Controlheader: CBC News Homepage
curl -I http://www.cbc.ca/news HTTP/1.1 200 OK Content-Type: text/html; charset=utf-8 Cache-Control: public, max-age=196 Date: Tue, 02 Feb 2016 22:22:58 GMT Connection: keep-aliveMaxCDN Homepage
curl -I https://www.maxcdn.com/ HTTP/1.1 200 OK Content-Type: text/html; charset=UTF-8 Cache-Control: max-age=604800 Date: Tue, 02 Feb 2016 22:33:53 GMT Connection: keep-aliveThe browser will cache a copy of the page for 196 seconds and 604,800 seconds for CBC and MaxCDN, respectively. This TTL for the browser is first determined by the origin, but some CDNs can override TTLs set by the origin. This feature comes in handy when you want the CDN TTL to be different than the browser TTL. According to this O’Reilly Radar post, ideally you want the browser cache time to be shorter than the CDN cache time. This way the user is able to load assets locally from the browser cache instead of returning to the CDN cache or origin server. If the browser TTL was longer than the CDN or origin TTL, the user would be at risk of receiving an outdated experience. Assume we’re dealing with a standard marketing website that’s updated every month. Let’s also assume that we want to ignore origin TTL settings. MaxCDN users would ignore the
Cache-Controlheader set by the origin by ticking the box (see below). They would then set a CDN cache time of one month using the first box, and override the CDN Cache-Control header to set a browser TTL of 7 days. To figure out what TTL you should set for the CDN and browser cache, find out two things: 1) how often you update your website and 2) how often the average user visits your website. The answer for number one will determine what the Set Default Cache Time is; the answer for number two will determine what the value for Override Cache-Control Header is.
Optimization #2: Strip Cookies for More Cache Hits (if Possible)If a static object such as an image is setting cookies on the client side, it’s not acting like a static object. Since the CDN can’t tell what’s going on behind the curtain at the origin server, it needs to assume that objects with cookies are dynamic. Therefore they will be a cache miss every time when, as a CDN user, you want more cache hits than misses. To check if your origin is setting cookies, look for the
Set-Cookieheader in the HTTP Response of an object. (You can easily inspect response headers with one of the request tools mentioned in this post.) In the following responses, notice how the
Set-Cookievalue changes on different requests for the same object. Request for X from user A
HTTP/1.1 200 OK Set-Cookie: __cfduid=dbb8af148848b88f6fd38603755f216031425593837 Cache-Control: public, max-age=315360000 X-Cache: MISSRequest for X from user B
HTTP/1.1 200 OK Set-Cookie: __cfduid=ec02ac3ebf9f6ea54ba4e09b1dd197ea51425593838 Cache-Control: public, max-age=315360000 X-Cache: MISSThis can force the CDN into revalidating the asset on every request resulting in a cache miss - something you probably don’t want. So if it isn’t possible to change how cookies are set on your origin server, check to see if your CDN can ignore cookie data from the origin server’s response. This will improve the CDN’s ability to cache an object.
Related PDF: Improving Your CDN’s Cache Hit Ratio
Optimization #3: Ignore Query Strings for More Cache Hits (If Possible)Query strings act much like cookies in terms of cacheability. When query strings are included in static object URLs, they can be mistaken as unique objects and are requested from the origin server on each request. This results in a decrease in cache hit ratio, which is why some CDNs make it easy to ignore or honor query strings in URLs. If you enable this option (tick the box), files with query string parameters such as ?v=22 in their URLs will be treated as separate cacheable files. This technique is frequently used by developers to automatically invalidate cache and force CDNs to re-cache files when they are updated. If you disable this option (like in screenshot above), all parameters will be ignored and invalidation won’t be possible, leading to a higher cache hit ratio.
Related tutorial: How to Ignore Query Strings Selectively
Optimization #4: Leverage the
If-Modified-Sinceheader can be included in requests from the CDN to your origin. By default, when the TTL of a file expires on the CDN, the CDN will pull a new copy of the file from the origin. This happens even if the file hasn’t changed, resulting in unneeded origin strain and bandwidth usage. For many websites this isn’t a huge issue. But for larger websites like CBC (mentioned above) not using the
If-Modified-Sinceheader is simply unacceptable. This is because the origin is already doing so much and can’t afford to do more. (Plus, as a public benefit corporation supported by taxpayers, CBC needs to do everything it can to cut down on origin infrastructure costs.) In the paper by the administrators that optimized the news site's CDN, they show that
If-Modified-Sincereduces origin strain by 70%. According to the paper: “This simple configuration [is] extremely effective at achieving a high origin offload: approximately 70% of origin requests result in a
HTTP/1.1 304 Not Modified.” Image: Figure 4 from paper that shows distribution of typical HTTP access codes over a four week period from CBC. The majority of user requests are serviced by
304 Not Modifiedresponses thanks to the
If-Modified-Sinceheader in requests. You can insert an
If-Modified-Sinceheader in CDN requests if your CDN provider offers custom caching rules. (Check out the CDN comparison charts in Step 2 to see if it’s offered.) Just keep in mind that you’ll have to properly configure your origin to send
HTTP/1.1 304 Not Modifiedfor objects that have not changed.
Related Blog Post: How to Get 304 Not Modified Header from CDN
Optimization #5: Choose Relevant Pricing ModelIn the CDN industry, there are two general pricing models: per-gigabyte pricing and pipeline pricing. Generally speaking, per-gigabyte pricing is great for businesses with fluctuating traffic patterns while pipeline pricing is great for businesses with relatively consistent traffic patterns. With per-gigabyte pricing you get a set amount of bandwidth to use over the course of a month (example: 300 TB). With pipeline pricing - also known as 95/5 pricing or burstable billing - you get a set amount of bandwidth per second to use (example: 1 Gbps). The benefit of per-gigabyte pricing is that you can use the bandwidth however you want, whenever you want. The benefit of pipeline pricing is that you get bandwidth at a cheaper price if you stay within your average bandwidth-per-second usage at least 95% of the time. Put simply: per-gigabyte pricing is less restrictive but bandwidth is slightly more expensive; pipeline pricing is more restrictive but bandwidth is less expensive. Keep in mind: If you exceed the pipeline limit more than 5% of the month, you could end up paying for a 2 Gbps pipeline instead of 1 Gbps pipeline, for example. And that pricing difference is huge. This is why it’s important to understand your traffic and bandwidth patterns. CBC has a deep understanding of its traffic patterns and is able to save money with pipeline pricing. According to the paper they use the “95th percentile-billing model for HTTP content to mitigate traffic spikes causing high one-time costs.” Image: Figure 1 from paper that shows example of traffic patterns seen during a breaking news event. CBC won't have to pay extra for this spike under the 95/5 pricing model because the spike is short-lived. This doesn’t mean pipeline pricing is always better though. Again, if your traffic often fluctuates or you’re a business just starting to gain traction, per-gigabyte pricing may be better. (Per-gigabyte pricing is offered by every CDN but pipeline pricing is not.)
Other CDN Optimizations Worth Consideration
- Enabling HTTP compression like GZip: According to Websiteoptimization.com: “Typical savings on compressed text files range from 60% to 85%. Webmasters who have deployed HTTP compression on their servers report savings of 30 to 50% off of their bandwidth bills.” Many CDNs let you compress content delivered by their edge servers.
- Proxying dynamic content through the CDN: eBay uses its CDN for TCP and SSL termination. As Steve Lerner, Senior Member of Technical Staff at eBay, explains: “We proxy dynamic and personalized experiences through CDNs to get the benefit of edge TCP connectivity and better routes.” Depending on your CDN, this may not be feasible. But if you serve up a lot of dynamic content, it’s worth looking into.
- Enabling HTTP/2: Given you aren’t still using the HTTP/1.x hacks like domain sharding, enabling one of these protocols on your CDN is a good idea if you have an HTTPS website. These protocols make SSL/TLS faster and help you deliver content quickly and responsibly.