In This Tutorial
- Introduction
- What’s YSlow Got to Do With It?
- Part 1: Minimizing File Sizes
- Do not scale images in HTML
- Minify JavaScript
- Minify CSS
- Make favicon small and cacheable
- Compress components with gzip
- Configure entity tags (ETags)
Introduction
Anyone who has been in the web design business knows that there is virtually no end to upgrading your website. You have to learn the markup for a webpage, how to style it, how to make it cross-browser, how to use server-side programming, and so on and so forth. It’s a constant learning experience. After years of hassle, you finally have a fully-functional, Web 2.0 website! It works in all browsers and even passes validation! Satisfied? Unfortunately, you’re not done yet. There’s a final step for all intended web programming professionals. It separates the men from the boys, the experienced from the amateurs. Why is it important when your site is already functional for all viewers? Multiple reasons.
If you aren’t already aware, I’m talking about optimization. In a nutshell, the goal of optimization is to decrease page load times. Some steps may decrease loading times by noticeable seconds, providing for a more comfortable user experience; some steps may only decrease it by milliseconds. Nonetheless, every step is important.
“Why should I care about milliseconds,” you ask? It’s about more than just the time. The importance of each step will be described within its own section, but to peak your interest, I’ll provide you with a prime example. One goal of webpage optimization is to decrease the number of HTTP requests that your webpage makes. Each request takes approximately 50 milliseconds to make the connection to the server, excluding the time it takes to download the requested file. Fifty milliseconds isn’t really important. The goal of decreased requests is improving server stress and performance.
Your personal webpage probably receives too little traffic to show any noticeable difference from an extra 100 server calls per page load. However, if you intend on being a professional programmer, you’ll likely be working for someone who runs a website with a large enough member base that an extra 100 server calls per page load would make the server inaccessible for most users.
A server can only receive and send so many replies per second. If you want to work with the big dogs, you have to program like the big dogs, whether or not your current situation requires it. Your experience in webpage optimization is exactly what they’ll want to see on your resumé.
What’s YSlow Got to Do With It?
YSlow is a powerful Firebug extension that was made by Yahoo! Inc. Even though the company is long gone in the battle of the search engines, Yahoo! still provides the leading tool in webpage optimization. In Firefox (unfortunately, it is the only browser supported), install both the Firebug add-on and the YSlow extension.
Post-installation, just open Firebug and select the YSlow option to receive a grade on each optimization checkpoint. Are you wondering how to achieve good grades in every area? Well, that’s the point of this tutorial!
Part 1: Minimizing File Sizes
I’d contemplating separating the tutorial into sections based on your skill level in each language – things you can do with just HTML, things you can do with CSS/JavaScript, and things you can do with PHP. This, however, left me with a very lacking Part 1, as there are not many HTML-only optimizations. Besides that fact, if you only know HTML, you shouldn’t really be worried about optimizing a webpage for excessive traffic, as the few things you can accomplish will make negligible impact compared to the other topics this tutorial will cover.
Thus, this tutorial is split up into these three sections: Minimizing File Size, Minimizing Server Calls, and Minimizing Parse Time. If you haven’t noticed by now, this is Part 1 of the YSlow tutorial, and it’s going to cover something of which we can all easily understand the advantages: reduced file sizes. Not only does it decrease download time – and thus loading speeds – for the client, it also saves bandwidth – and thus money – for the provider.
This tutorial is going to cover five steps, in order of ease: HTML image scaling (the dos and don’ts), minifying JavaScript and CSS, making favicons small and cacheable, compress components with gzip, and configuring entity tags (ETags).
Do not scale images in HTML
Scaling images? Don’t do it. It’s that simple. Some web programmers, when desiring a thumbnail image, will include the full-sized image and just use the image’s height and width attributes (via HTML or CSS) to resize the image. Saves time and space, since you don’t have to generate or store a thumbnail image! Great idea, right? Not at all. This practice is accompanied with three major negative factors that need to be considered.
First, any server administrator would be appalled to learn that they’re using the bandwidth cost of a full-sized image to display a thumbnail. Let’s take an extreme example of scaled-image thumbnailing. Say you run a wallpaper website. Obviously, you’ll want to display thumbnails of each wallpaper so that you can offer multiple wallpapers in a small area of space. You use the full wallpaper (1024×768; 200KiB) and just scale it down to, say, 133×100. A hypothetical 133×100 thumbnail would only take up maybe 5KiB. That’s a difference of 195KiB per wallpaper per page load. You save on server space (5KiB!), but you lose in bandwidth. Any good web developer would trade 5KiB space for 195KiB bandwidth – especially per wallpaper per page load – any day. Even with less extreme examples, such as half-sizing an image, the bandwidth cost will add up on high-traffic servers. The server space it would take up to thumbnail the image is worth the bandwidth saved.
Second, the client will not appreciate the download time. The full image may not be displayed, but the user still has to wait for it to download. It wouldn’t even take a dial-up user but a second to download an actual, non-image-scaled 133×100 thumbnail; but if you scale a full-size image, especially multiple images per page, there is an unbelievably noticeable difference in page loading times. A collection of 10 wallpapers on the page with legitimate thumbnails would take up only 50KiB – a near instantaneous download for a DSL user. The same wallpapers scaled in HTML would require a download of 2MiB! That’s a noticeable page loading difference.
Third, webpage parsing time is also affected by this. The use of an image’s width and height attributes will be covered again in Part 3 of this tutorial, except shown in a different light. By setting them, the browser must calculate for and resize the larger image in order to create the thumbnail. These are precious milliseconds wasted on every image load for every viewer!
Oh, but the hassle… you’d have to open each wallpaper in your favorite image editor, scale it down, upload it, link to it, etc. Isn’t the time and effort saved worth the bandwidth cost? Well, dear reader, the answer is no. Thanks to modern server-side programming languages, this process can be automated – no image editors, no per-image labor, no uploading, no manual link additions. However, that’s another topic for another tutorial that I may write in the future. In the mean time, keep in mind the cost of image-scaling and make a habit of not doing it.
Minify JavaScript
A quick and easy way to save bandwidth and decrease download time is to shorten JavaScript and CSS file lengths. JavaScript, especially, is filled with redundancy and unnecessarily long variable names. Why reference myFavoriteVariable 500 times, when you can just call it x and save 17 bytes with every reference? Why display 100 tab indents when it will run just the same without them? Don’t get me wrong! I’m not advocating hideous code. For the love of all that is good, use yourFavoriteVariableName and tabular spacing. When programming, readability is important. When processing, readability is useless. How can we program legibly and still send a compressed file to the client? Using a compiler service!
Similar to thumbnails, you can automate a compiler service. For the sake of time, I won’t be doing that here. Like thumbnails, perhaps I’ll do it in a separate tutorial. I will, however, link you to the forever-useful Closure Compiler Service by Google Inc. This service will allow you to manually copy-paste your legible, commented, indented code into the compiler, then copy-paste its condensed, comment-free, small-bandwidth code into a file for you to use on your webpage. There are three options for this service: whitespace only, simple, and advanced.
Whitespace only is self-explanatory. The only compression it does is remove whitespace and comments. Simple is most likely what you’ll need. It doesn’t rename public variables that may be called outside of the script. For example, you copy-paste the contents of a JavaScript file that contains the function expandCollapse. This file is intended to be referenced externally (<script src>) so that said function can be called within the HTML document itself (such as when a link is clicked). The Simple compiler is the one you would want to use, since it will not rename the function, thus not affecting references to it from outside the script itself. The Advanced compiler, on the other hand, will treat the script as if it were the entirety of the program. It will compress all variables, so outside references to the script will likely be broken when the variables are renamed.
So that you get a first-hand understanding, I’ll provide example compressions for a sample script.
Unaltered Script: 301 bytes
var i_heart_you = function(my_custom_code)
{
for (var x = 0; x < 10; x++)
{
document.write("I am going to remind you that ");
// why did I even use two document.writes?
document.write("my custom code is " +
my_custom_code);
x += 1;
}
}
i_heart_you("copyright Charles Stover");
Whitespace only: 212 bytes
var i_heart_you=function(my_custom_code){for(var x=0;x<10;x++){document.write("I am going to remind you that ");document.write("my custom code is "+my_custom_code);x+=1}};i_heart_you("copyright Charles Stover");
Notes: It removed whitespace and comments. That is all.
Simple: 186 bytes
var i_heart_you=function(b){for(var a=0;a<10;a++){document.write("I am going to remind you that ");document.write("my custom code is "+b);a+=1}};i_heart_you("copyright Charles Stover");
Notes: It renamed all the private variables (the ones referenced within the function itself), but left the function name the same so that it can be referenced by other programs.
Advanced: 139 bytes
for(var a=0;a<10;a++){document.write("I am going to remind you that ");document.write("my custom code is copyright Charles Stover");a+=1};
Notes: It removed the i_heart_you function and my_custom_code variable entirely, since they weren’t necessary, having only been called once. I’m surprised it didn’t combine both document.writes, but I guess no compiler is perfect.
The imperfection in Advanced brings me to an important point. After compiling any code, verify that it still works. No compiler is perfect. That can’t be stressed enough. While I’ve had a 100% success rate with small scripts such as this, compressing thousand-line projects has been a more complicated issue. Don’t blindly trust that the compiler did a good job. If Simple and Advanced destroy functionality in your complex project, you should at least be able to rely on ol’ Whitespace-only to save you some bandwidth with no errors attached.
Minify CSS
While JavaScript compression will save a ton of bandwidth, let’s not forget about CSS! I’ve yet to find a CSS compression utility that absolutely trumps the rest, but the most reliable I’ve found is Clean CSS. The problem with CSS compression is that there is no one method better than the rest. It’s really a guess-and-check principle. While it may result in a smaller file size to group CSS declarations by element (form { /* all attributes of the form element */ }) or by attribute (.all-elements, .that-are-red { color : #ff0000; }) or by multiple attributes (.all-elements, .that-are-both, .red-and-large { color : #ff0000; font-size : 18pt; }), there really is no way to know until you’ve done so, as each CSS file will prosper differently from each method of compression.
Due to the countless combination of compression settings, you may have to try different options with your CSS compression utility until you find one that shows a noticeable difference. Unlike JavaScript compression, you aren’t as likely to save as much bandwidth, and it will take longer to find the appropriate setting, but bandwidth saved is bandwidth earned
to put a spin on an old phrase. Besides, you won’t have to compress your CSS files as often as your JavaScript files, since there generally aren’t as many CSS files on a website or webpage.
Make favicon small and cacheable
Your favicon is a very important and highly-accessed part of your website. Some browsers render it as 32×32, and I believe vBulletin has made it a recent practice to use a 32×32 favicon by default in their forum packages. Don’t let them fool you. Well over 90% of your viewers, if not 99%, are going to be viewing your favicon as 16×16. It is pointless to have the vast majority of your viewers downloading (remember: download time and bandwidth costs) an image that’s four times larger than they’ll be able to see. Use a 16×16 favicon to decrease file size, and let the vast minority who view 32×32 favicons view a likely-unnoticeably stretched icon.
Most browsers are good at caching favicons without server recommendations, since it’s a file that’s requested on every page load. YSlow insists that you manually make the favicon cacheable anyway, and it wouldn’t hurt to listen. In the odds that there is some browser out there (probably a beta or mobile browser) that doesn’t cache favicons by default, it will save you bandwidth and save your clients loading time. Due to the fact that the majority of browsers cache favicons by default and that caching is covered in part 2 of this tutorial, I’ll leave it out of this section. Don’t forget, when you learn to set permanent cache (or if you already know), to go back and set it for your favicon!
Compress components with gzip
Finally we get to server-side programming! Ah, the most complicated – but the most fun – area of web programming. gzip, if you aren’t aware, is a method of file compression. Unlike the JavaScript and CSS compression tools, it doesn’t just remove useless characters. It changes the file type altogether – think ZIP files or RAR files. Modern web browsers support receiving gzip files and are capable of automatically extracting and displaying them. This is the case for any file type. There are two ways to gzip a file: one with static files and one with dynamic files.
To determine if your viewer can accept gzipped files, just check the $_SERVER['HTTP_ACCEPT_ENCODING'] variable. An example HTTP_ACCEPT_ENCODING value is gzip,deflate,sdch
.
Dynamic Files
For files that change often (such as a home page, topic list, counter, and pretty much any non-archival HTML page), it will be nearly impossible and pointless to create and save a gzipped copy every time the page changes, which is possibly every few seconds. If only there were a way to gzip a file in real-time. But wait! There is! PHP is capable of doing this using the lovely ob_start function. After the headers of a page have sent (assuming you’re sending custom headers), just wrap the ob_start function with the ob_gzhandler parameter around the content of the page.
For those wondering, the ob in ob_start is an acronym for output buffer. It records all the output and manipulates it in whatever way you decide before sending it to the client. In this case, we’re going to gzip the output before actually sending it.
<?php
// get the headers out of the way
header(“Content-Language: en-us”);
header(“Content-Type: text/html; charset=utf-8″);
// gzip the following content
ob_start(‘ob_gzhandler’);
// the content itself; everything in the source code goes here
// NOTHING but the header should go before or after the output buffer!
include(‘./this/is/where/I/store/my/template/header.html’);
echo ‘<p>This is all the content on my homepage!
</p>’;
include(‘./this/is/where/I/store/my/template/footer.html’);
// close the buffer and send the data to the client
ob_end_flush();
?>
Simple, right? Just put a line of code above the content (to tell it to start recording what to compress) and a line of code after the content (to tell it to compress and send the data).
“But, Charles!” you so rudely interrupt, “What about the browsers that don’t support gzip? Won’t they get errors when receiving the gzipped content?”
No, no, rude reader. PHP’s gzip-handling object buffer is kind enough to check the HTTP_ACCEPT_ENCODING for us! If the browser supports it, it will send the gzipped content. If the browser doesn’t support it, it will send unaltered content. Easy, huh? Just use the object buffer for all your dynamic, PHP-generated files, and let PHP do the rest!
Bandwidth saved.
Static Files
To save server resources, if you know a file is not going to change very often, you may want to store a gzipped copy on the server instead of having PHP automatically generate one every time the file is accessed. You can either use the gzip program provided by the gzip home page to compress every file manually, or you can just use PHP’s gzencode function to create any non-existent gzip files. To do this, have users access a PHP file which will determine whether or not their browser supports gzip. If their browser does, send them the gzipped file. If it does not, send them the uncompressed file. I’ve included an example of this, but you may feel free to create your own.
<?php
// e.g. download.php?file=path/to/file.jpg
// This would be a great time to use mod_rewrite. 
$file_to_get = str_replace(‘..’, ”, $_GET['file']);
$encoded_file_to_get = $file_to_get . ‘.gz’;
// Continue only if gzip is part of the browser’s accepted encoding and
// is either at the start of the list, in the middle of the list, or at the end of the list.
// i.e. gzip is not in the middle of the word, e.g. bugzipz, gzip2
// We’ll want to prevent inaccurate matches of gzip in the middle of a word for future scenarios
// such as a release of gzip2 or a not-yet-existent compression methods with similar names.
// If the browser supports bugzipz but not gzip, we don’t want to send a gzip file!
if (preg_match(‘/[^|,]gzip[,|$]/’, $_SERVER['HTTP_ACCEPT_ENCODING']))
{
// Check to see if a compressed copy exists,
// and if it doesn’t, create it!
if (!file_exists($encoded_file_to_get))
{
// Create the file and prepare it for writing.
$handle = fopen($encoded_file_to_get, ‘x’);
// Get the uncompressed contents, compress them, and write them to the gzipped file.
fwrite($handle, gzencode(file_get_contents($file_to_get)));
fclose($handle);
}
// Tell the browser the content is encoded.
header(‘Content-Encoding: gzip’);
// Send the encoded content!
echo file_get_contents($encoded_file_to_get);
}
// gzip is not supported (not in the ACCEPTED_ENCODING header), so just return the uncompressed file.
else
echo file_get_contents($file_to_get);
?>
Hopefully you know why you should create the gzipped file instead of just gzencode’ing the contents every time. If you don’t, I’ll be brief. To gzip the contents, the server has to calculate the gzipped contents. This takes time, albeit less time than the user saves by not downloading as large of a file, so even gzipping it on every page load is a step up from not gzipping it at all. By caching the gzipped copy, the server doesn’t have to recalculate it with every page load, saving precious milliseconds of time and CPU usage.
Configure entity tags (ETags)
Last but not least, you’ll want to set up entity tag headers. There’s a very simple analogy that can be used to explain how ETags work.
Two days ago:
Client: I’d like to use yslow-tutorial.html, please.
Server: Sure thing. Here it is. This is version 12345678 (this is the ETag).
Today:
Client: I was just in here a few days ago, and I picked up a copy of yslow-tutorial.html. The one you gave me was version 12345678
(this is $_SERVER['If-Modified-Since']). Is there a newer copy?
Scenario 1:
Server: Nope, that’s the latest. (client uses the cached copy; server sends no data; no bandwidth used.)
Scenario 2:
Server: Why, yes there is! Here it is. This is version 12345690 (this is the new ETag).
ETags are like a method of caching. Their only downfall compared to caching is that they do require the client connects to the server in the first place, however they will greatly reduce bandwidth. It’s a great idea to combine ETags with cache, but you don’t need to worry about cache until we come to that part of the tutorial. For now, you’ll get much use out of simply using ETags.
The easiest method to calculate an ETag is to just return the file’s filemtime for static content. When you update the file, the filemtime automatically changes; thus the next time the client requests the file, the server will know that the client’s copy is outdated. However, if you’re using a dynamic PHP file, filemtime won’t necessarily change just because the content changes. Take for example a simple file that contains <?php echo time(); ?>. Every time you view the page, the content will be different. However, the filemtime will always be the same, since the file itself hasn’t been modified; only its output has changed. In these cases, there are different methods you can use to determine an ETag, and it’s ultimately up to you to decide the best method. If you have content on the page and are capable of determining the last time the content was updated (e.g. the time of an article posting), you can use that to generate the ETag. Otherwise, you may have to resort to something such as generating an md5 hash of the output and using it as the ETag. If the output changes, the md5 hash changes, the ETag changes. It’s crude [to generate an md5 hash of a large output], but it’s a last resort.
Once you have determined how to calculate your ETag – simply any string that will change whenever the page changes – and have generated the string, you can send it using the ETag header.
header('ETag', filemtime($_GET['file']));
Simple enough, right? Unfortunately, there’s more. The scenario where the server sends no data is not automated. When the client tells the server it’s version, it’s up to you to take that and prevent data from being sent. You can do this like so:
$file_to_get = str_replace('..', '', $_GET['file']);
// If the file doesn’t exist, give it a “404″ ETag.
// That way, if it exists in the future, the ETag will change.
// Until then, there’s no need to keep resending the 404 HTML page.
$etag = (file_exists($file_to_get) ? filemtime($file_to_get) : 404);
// If the client already has a copy and it is the same as the server copy…
if (
array_key_exists(‘If-Modified-Since’, $_SERVER) &&
$_SERVER['If-Modified-Since'] == $etag
)
{
// Tell the client that there is no newer version, and don’t send any data.
header(‘HTTP/1.1 304 Not Modified’);
header(‘Connection: close’);
exit();
}
// Either the client has no previous copy, or it is not the latest copy.
// Note: else is not necessary here, since I used exit above;
// I’m just using it to help with the if-statement readability.
else
{
// Send the entity tag (unique version ID).
header(‘ETag’, $etag);
// Send the data. (Don’t forget to compress it!)
echo file_get_contents($file_to_get);
}
Viola! The client now has a copy of a file and a version number for future reference. Assuming the browser supports ETags (most, if not all, modern browsers do), you just saved yourself a ton of bandwidth in repeated file accesses.