To begin I would like to thank all of you that have shown interest in this blog and posted comments. The site launched one week ago and has already generated over 13,000 unique visitors and 40,000 page views. As I am writing this article my previous post is on the front page of del.icio.us. I look forward to expanding and improving the site and ask that if you have any comments or suggestions please contact me.
Following up on my post last week about the eAccelerator PHP optimizer and cache manager I thought I would post a general introduction to caching. This article focuses on caching in an Apache/PHP environment but the principles remain the same for any platform/language you work with.
Browser Cache
Every browser has it’s own cache but the size and behavior of each varies. Fortunately we are not at the sole mercy of the browser to determine how to handle our data. Through the use of HTTP headers you can dictate to the browser when to request updated content and when to serve files from the local cache. I highly suggest downloading the Firefox plugins Firebug and YSlow to help you analyze your HTTP headers. Below you can see a screenshot of the headers tab available on any HTTP request through Firebug.
The difficulty of caching is determining how you want the browser to cache and then pulling it all together on the server-side. Different file types, for instance, might be generally more static than others and therefore safer to cache for long periods of time. On the other hand, some content may updated on a regular basis and not need to be cached for long, if at all. Since the browser is closest to the user and your server is potentially never involved in the serving of content the highest performance boost can come from effective browser caching.
Below is an example of setting a header property in PHP. This would tell the browser to cache the files for one year from the date of this post.
<?php
header('Expires: Fri, 25 Apr 2009 00:00:00 GMT');
?>
Here is an example of setting headers with Apache. This would set an expiration of one year from today on all images files.
<FilesMatch "\.(jpg|jpeg|png|gif)$">
Header set Expires "Fri, 25 Apr 2009 00:00:00 GMT"
</FilesMatch>
There are a number of header parameters that can be sent as part of your response to HTTP requests. Here is a breakdown of the ones you should be familiar with for caching purposes…
Last-modified: Fri, 25 Apr 2008 00:00:00 GMT
This header tells the browser when the file being requested was last altered. The browser “asks” your server if it has a file that has a more recent “Last-modified” timestamp than the version that is currently has stored. If a newer file exists on your server then the browser requests the updated file, else the existing file is served. Although the communication does have a little overhead, it is much more efficient than simply serving the same unmodified content over and over.
This header tells the browser when the file being requested was last altered. The browser “asks” your server if it has a file that has a more recent “Last-modified” timestamp than the version that is currently has stored. If a newer file exists on your server then the browser requests the updated file, else the existing file is served. Although the communication does have a little overhead, it is much more efficient than simply serving the same unmodified content over and over.
Etag: “28ff-44aee6630f900″
Etags are basically unique identifiers attached to your files that the browser can use to compare cached files against. This works much like the “Last-modified” tag and there has been quite a bit of debate as to whether one is better than the other or whether to include both. I personally suggest including both as there may be rare situations where the Etag would detect changes that are not effected in the timestamp.
Etags are basically unique identifiers attached to your files that the browser can use to compare cached files against. This works much like the “Last-modified” tag and there has been quite a bit of debate as to whether one is better than the other or whether to include both. I personally suggest including both as there may be rare situations where the Etag would detect changes that are not effected in the timestamp.
Expires: Fri, 25 Apr 2009 00:00:00 GMT
The expires header is ideal when you can plan on how long your content is safely cacheable. Why is it superior to the previous two cache controls? Using expires does not require a trip to the server to verify the freshness of your content. The browser simply serves the files from the local cache for the fastest user experience and zero server overhead.
The expires header is ideal when you can plan on how long your content is safely cacheable. Why is it superior to the previous two cache controls? Using expires does not require a trip to the server to verify the freshness of your content. The browser simply serves the files from the local cache for the fastest user experience and zero server overhead.
Cache-Control: max-age=86400
The max-age tag, much like the expires header, eliminates the need to check for updated content when the cached file is within the age limit specified. The value assigned to the max-age is the number of milliseconds the file will be considered fresh. During that time, the locally stored files will be served. It is important to note that HTTP/1.1 allows caching of anything unless overridden by the “Cache-Control” header.
The max-age tag, much like the expires header, eliminates the need to check for updated content when the cached file is within the age limit specified. The value assigned to the max-age is the number of milliseconds the file will be considered fresh. During that time, the locally stored files will be served. It is important to note that HTTP/1.1 allows caching of anything unless overridden by the “Cache-Control” header.
For static content I highly suggest serving files from a cookie-free subdomain (i.e. static.domain.com) and establishing a “never expires” policy. Many larger site have already taken advantage of this tactic but smaller sites can also benefit from the method. When you need to make a change to a file simply change the reference to the file to an incremented version (javascrtipt_1.js -> javascript_2.js) of itself and then the newer version will be downloaded and cached. There are even ways to automate the versioning process.
For dynamic files it is best to use the Cache-Control: no-cache header and for more static files the Last-modified header is appropriate. Another method to ensure content is refreshed is to append some unique querystring value to the URI.
Example of setting a “never expires” header for all static files…
<FilesMatch "\.(jpg|jpeg|png|gif|swf|css|js|ico|pdf)$">
Header add "Expires" "Mon, 01 Jan 2018 00:00:00 GMT"
Header add "Cache-Control" "max-age=31536000"
</FilesMatch>
or with PHP…
<?php
header('Expires: Mon, 01 Jan 2018 00:00:00 GMT');
header('Cache-Control: max-age=31536000');
?>
Server Caching
Server-side caching can have a huge impact on performance. Since the highest level of caching your PHP files should be set to is Last-modified, Apache will be serving these files most frequently. There are a number of third party caching/PHP optimizers that make caching a breeze such as XCache, eAccelerator, memcached and others. You can even store your compiled pages in memory with these optimizers for an even faster client/server transaction.
You are not limited to using one of these third parties for caching. You can manually cache files in PHP through the use of code like the following…
<?php
$current = $_SERVER["SCRIPT_NAME"];
$parts = Explode('/', $current);
$current = $parts[count($parts) - 1];
$store = 'cache/';
$page = $_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
$cache = $store.md5($page).'.cache';
if(file_exists($cache) && (filemtime($current) < filemtime($cache)) ) {
@readfile($cache);
exit();
}
ob_start();
// YOUR PHP SCRIPT //
$new = fopen($cache, 'w');
fwrite($new, ob_get_contents());
fclose($new);
ob_end_flush();
?>
Intermediary Caching
By intermediary caching I am referring to anything between the browser and the server that caches your content. Perhaps you are using a third party CDN (content delivery network) such as Akamai or edgecast to speed up HTTP request delivery. These types of services are tailored to high volume websites with widespread user-bases and are generally cost-prohibitive to small-medium sized websites.
There are also other caches you may not not even know exist. Many large corporations, educational institutions and even countries cache content coming into their network. Oftentimes the proxies function much like browsers in their respect for HTTP headers however they do not always abide by your rules so be sure and identify private content by defining unique querystring parameters lest sensitive information be spread to multiple recipients.