Slashdot Log In
High Performance Web Sites
Posted by
samzenpus
on Wed Oct 10, 2007 01:45 PM
from the heavy-duty-net dept.
from the heavy-duty-net dept.
Michael J. Ross writes "Every Internet user's impressions of a Web site is greatly affected by how quickly that site's pages are presented to the user, relative to their expectations — regardless of whether they have a broadband or narrowband connection. Web developers often assume that most page-loading performance problems originate on the back-end, and thus the developers have little control over performance on the front-end, i.e., directly in the visitor's browser. But Steve Souders, head of site performance at Yahoo, argues otherwise in his book, High Performance Web Sites: Essential Knowledge for Frontend Engineers." Read on for the rest of Michael's review.
| High Performance Web Sites | |
| author | Steve Souders |
| pages | 168 |
| publisher | O'Reilly Media |
| rating | 9/10 |
| reviewer | Michael J. Ross |
| ISBN | 0596529309 |
| summary | 14 rules for faster Web pages |
The typical Web developer — particularly one well-versed in database programming — might believe that the bulk of a Web page's response time is consumed in delivering the HTML document from the Web server, and in performing other back-end tasks, such as querying a database for the values presented in the page. But the author quantitatively demonstrates that — at least for what are arguably the top 10 sites — less than 20 percent of the total response time is consumed by downloading the HTML document. Consequently, more than 80 percent of the response time is spent on front-end processing — specifically, downloading all of the components other than the HTML document itself. In turn, cutting that front-end load in half would improve the total response time by more than 40 percent. At first glance, this may seem insignificant, given how few seconds or even deciseconds it takes for the typical Web page to appear using broadband. But any delays, even a fraction of a second, accumulate in reducing the satisfaction of the user. Likewise, improved site performance not only benefits the site visitor, in terms of faster page loading, but also the site owner, with reduced bandwidth costs and happier site visitors.
Creators and maintainers of Web sites of all sizes should thus take a strong interest in the advice provided by "Chief Performance Yahoo!," in the 14 rules for improving Web site performance that he has learned in the trenches. High Performance Web Sites was published on 11 September 2007, by O'Reilly Media, under the ISBNs 0596529309 and 978-0596529307. As with all of their other titles, the publisher provides a page for the book, where visitors can purchase or register a copy of the book, or read online versions of its table of contents, index, and a sample chapter, "Rule 4: Gzip Components" (Chapter 4), as a PDF file. In addition, visitors can read or contribute reviews of the book, as well as errata — of which there are none, as of this writing. O'Reilly's site also hosts a video titled "High Performance Web Sites: 14 Rules for Faster Pages," in which the author talks about his site performance best practices.
The bulk of the book's information is contained in 14 chapters, with each one corresponding to one of the performance rules. Preceding this material are two chapters on the importance of front-end performance, and an overview of HTTP. Together these form a well-chosen springboard for launching into the performance rules. In an additional and last chapter, "Deconstructing 10 Top Sites," the author analyzes the performance of 10 major Web sites, including his own, Yahoo, to provide real-world examples of how the implementation of his performance rules could make a dramatic difference in the response times of those sites. These test results and his analysis are preceded by a discussion of page weight, response times, YSlow grading, and details on how he performed the testing. Naturally, if and when a reader peruses those sites, checking their performance at the time, the owners of those sites may have fixed most if not all of the performance problems pointed out by Steve Souders. If they have not, then they have no excuse, if only because of the publication of this book.
Each chapter begins with a brief introduction to whatever particular performance problem is addressed by that chapter's rule. Subsequent sections provide more technical detail, including the extent of the problem found on the previously mentioned 10 top Web sites. The author then explains how the rule in question solves the problem, with test results to back up the claims. For some of the rules, alternative solutions are presented, as well as the pros and cons of implementing his suggestions. For instance, in his coverage of JavaScript minification, he examines the potential downsides to this practice, including increased code maintenance costs. Every chapter ends with a restatement of the rule.
The book is a quick read compared to most technical books, and not just due to its relatively small size (168 pages), but also the writing style. Admittedly, this may be partly the result of O'Reilly's in-house and perhaps outsource editors — oftentimes the unsung heroes of publishing enterprises. This book is also valuable in that it offers the candid perspective of a Web performance expert, who never loses sight of the importance of the end-user experience. (My favorite phrase in the book, on page 38, is: "...the HTML page is the progress indicator.")
The ease of implementing the rules varies greatly. Most developers would have no difficulty putting into practice the admonition to make CSS and JavaScript files external, but would likely find it far more challenging, for instance, to use a content delivery network, if their budget puts it out of reach. In fact, differences in difficulty levels will be most apparent to the reader when he or she finishes Chapter 1 (on making fewer HTTP requests, which is straightforward) and begins reading Chapter 2 (content delivery networks).
In the book's final chapter, Steve Souders critiques the top 10 sites used as examples throughout the book, evaluating them for performance and specifically how they could improve that through the implementation of his 14 rules. In critiquing the Web site of his employer, he apparently pulls no punches — though few are needed, because the site ranks high in performance versus the others, as does Google. Such objectivity is appreciated.
For Web developers who would like to test the performance of the Web sites for which they are responsible, the author mentions in his final chapter the five primary tools that he used for evaluating the top 10 Web sites for the book, and, presumably, used for the work that he and his team do at Yahoo. These include YSlow, a tool that he created himself. Also, in Chapter 5, he briefly mentions another of his tools, sleep.cgi, a freely available Perl script that tests how delayed components affect Web pages.
As with any book, this one is not perfect — nor is any work. In Chapter 1, the author could make more clear the distinction between function and file modularization, as otherwise his discussion could confuse inexperienced programmers. In Chapter 10, the author explores the gains to be made from minifying JavaScript code, but fails to do the same for HTML files, or even explain the absence of this coverage — though he does briefly discuss minifying CSS. Lastly, the redundant restatement of the rules at the end of every chapter, can be eliminated — if only in keeping with the spirit of improving performance and efficiency by reducing reader workload.
Yet these weaknesses are inconsequential and easily fixable. The author's core ideas are clearly explained; the performance improvements are demonstrated; the book's production is excellent. High Performance Web Sites is highly recommended to all Web developers seriously interested in improving their site visitors' experiences.
Michael J. Ross is a Web developer, freelance writer, and the editor of PristinePlanet.com's free newsletter.
You can purchase High Performance Web Sites from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Creators and maintainers of Web sites of all sizes should thus take a strong interest in the advice provided by "Chief Performance Yahoo!," in the 14 rules for improving Web site performance that he has learned in the trenches. High Performance Web Sites was published on 11 September 2007, by O'Reilly Media, under the ISBNs 0596529309 and 978-0596529307. As with all of their other titles, the publisher provides a page for the book, where visitors can purchase or register a copy of the book, or read online versions of its table of contents, index, and a sample chapter, "Rule 4: Gzip Components" (Chapter 4), as a PDF file. In addition, visitors can read or contribute reviews of the book, as well as errata — of which there are none, as of this writing. O'Reilly's site also hosts a video titled "High Performance Web Sites: 14 Rules for Faster Pages," in which the author talks about his site performance best practices.
The bulk of the book's information is contained in 14 chapters, with each one corresponding to one of the performance rules. Preceding this material are two chapters on the importance of front-end performance, and an overview of HTTP. Together these form a well-chosen springboard for launching into the performance rules. In an additional and last chapter, "Deconstructing 10 Top Sites," the author analyzes the performance of 10 major Web sites, including his own, Yahoo, to provide real-world examples of how the implementation of his performance rules could make a dramatic difference in the response times of those sites. These test results and his analysis are preceded by a discussion of page weight, response times, YSlow grading, and details on how he performed the testing. Naturally, if and when a reader peruses those sites, checking their performance at the time, the owners of those sites may have fixed most if not all of the performance problems pointed out by Steve Souders. If they have not, then they have no excuse, if only because of the publication of this book.
Each chapter begins with a brief introduction to whatever particular performance problem is addressed by that chapter's rule. Subsequent sections provide more technical detail, including the extent of the problem found on the previously mentioned 10 top Web sites. The author then explains how the rule in question solves the problem, with test results to back up the claims. For some of the rules, alternative solutions are presented, as well as the pros and cons of implementing his suggestions. For instance, in his coverage of JavaScript minification, he examines the potential downsides to this practice, including increased code maintenance costs. Every chapter ends with a restatement of the rule.
The book is a quick read compared to most technical books, and not just due to its relatively small size (168 pages), but also the writing style. Admittedly, this may be partly the result of O'Reilly's in-house and perhaps outsource editors — oftentimes the unsung heroes of publishing enterprises. This book is also valuable in that it offers the candid perspective of a Web performance expert, who never loses sight of the importance of the end-user experience. (My favorite phrase in the book, on page 38, is: "...the HTML page is the progress indicator.")
The ease of implementing the rules varies greatly. Most developers would have no difficulty putting into practice the admonition to make CSS and JavaScript files external, but would likely find it far more challenging, for instance, to use a content delivery network, if their budget puts it out of reach. In fact, differences in difficulty levels will be most apparent to the reader when he or she finishes Chapter 1 (on making fewer HTTP requests, which is straightforward) and begins reading Chapter 2 (content delivery networks).
In the book's final chapter, Steve Souders critiques the top 10 sites used as examples throughout the book, evaluating them for performance and specifically how they could improve that through the implementation of his 14 rules. In critiquing the Web site of his employer, he apparently pulls no punches — though few are needed, because the site ranks high in performance versus the others, as does Google. Such objectivity is appreciated.
For Web developers who would like to test the performance of the Web sites for which they are responsible, the author mentions in his final chapter the five primary tools that he used for evaluating the top 10 Web sites for the book, and, presumably, used for the work that he and his team do at Yahoo. These include YSlow, a tool that he created himself. Also, in Chapter 5, he briefly mentions another of his tools, sleep.cgi, a freely available Perl script that tests how delayed components affect Web pages.
As with any book, this one is not perfect — nor is any work. In Chapter 1, the author could make more clear the distinction between function and file modularization, as otherwise his discussion could confuse inexperienced programmers. In Chapter 10, the author explores the gains to be made from minifying JavaScript code, but fails to do the same for HTML files, or even explain the absence of this coverage — though he does briefly discuss minifying CSS. Lastly, the redundant restatement of the rules at the end of every chapter, can be eliminated — if only in keeping with the spirit of improving performance and efficiency by reducing reader workload.
Yet these weaknesses are inconsequential and easily fixable. The author's core ideas are clearly explained; the performance improvements are demonstrated; the book's production is excellent. High Performance Web Sites is highly recommended to all Web developers seriously interested in improving their site visitors' experiences.
Michael J. Ross is a Web developer, freelance writer, and the editor of PristinePlanet.com's free newsletter.
You can purchase High Performance Web Sites from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Is it just me... (Score:2)
Rule #34: Don't be the first Java site of the day (Score:4, Funny)
Rule #34: Don't be the first Java site your users visit during the day. (Unfortunately, this pretty turned into "don't use Java applets" unless you could find a hidden way to load an throwaway applet in another frame, etc.)
Re:Rule #34: Don't be the first Java site of the d (Score:4, Interesting)
Interesting (Score:1)
All my sites load fast (Score:2, Interesting)
(http://www.geometricvisions.com/ | Last Journal: Monday May 02 2005, @05:35PM)
All my pages are static HTML. Not a web application in site, not even PHP. Yes, it's a drag when I need to do some kind of sitewide update, like adding a navigation item.
I also have less to worry about security, as long as my hosting service keeps their patches up to date, I know I haven't introduced any holes myself.
Also, for the most part, my pages are very light on graphics, with most of the graphics present being repeated on every page such as my site's logo, which gets cached.
Finally, all my pages are XHTML 1.0 Strict with CSS, with the CSS being provided by a single sitewide stylesheet. This means less HTML text to transfer compared to formatting with HTML tags.
Re:All my sites load fast (Score:5, Funny)
(http://www.nickfitz.co.uk/)
You forgot to link to your site... [amish.org]
It's easy to find parking space for my car (Score:5, Funny)
It's a bicycle!!1
Solution (Score:5, Interesting)
This is a great point, but here is my anecdotal experience:
Years ago, I tested static HTML vs. PHP by simply benchmarking a simple document (I used the GPL license). On the particular box, I was able to serve over 400 pages per second with static HTML but only about 12 pages per second with PHP. I was blown away. I went one step further and used PHP to fetch the data from Oracle (OCI8, IIRC) and that went down to 3 requests/sec. You can see that caching does help, but not a whole lot.
So, rather than whine about it, what is the solution?
AJAX, done properly, will solve the problem. Basically, instead of serving dynamic pages with PHP, JSP, ASP or whatever... just serve an AJAX client (which is served in a speedy manner with no server side processing to bog things down). This client loads in the browser and fetches a static XML document from the server and then uses the viewer's browser to generate the page - so everything thrown down by the server is static and all processing is done on the client side.
Now, to facilitate a dynamic website (e.g. - message board, journal, or whatever), you have to generate the XML file upon insert (which are generally a small fraction of the read load) using a trigger or embedded in the code.
Viola! Static performance with dynamic content using browser-side processing.
Re:Solution (Score:5, Informative)
(about:mozilla | Last Journal: Thursday November 24 2005, @11:09AM)
I find that sites built with the method you describe are the asshole sites that fuck with browser history, disable the back button, try to disable the context menu, and those dumb ass tricks to get around the fact they don't know how to write proper server side code.
There's no reason you can't make a fast serverside site (with ajax too, that works without the stupid tricks I described above), if you can't I suggest you educate yourself, or don't use a wallmart PC for production use.
I've personally written many J2EE webapps (no EJB BS, spring & struts & jsp/velocity) that where very fast, with proper coding you can let the browser cache stuff so it constantly doesn't have to refetch crap. when you do this, all you push down to the client is the HTML to render, which browsers are really good at doing quickly.
Re:All my sites load fast (Score:5, Funny)
In other words, it's a smalltime hobby site, and you're not a web developer. That's fine, and I agree that it's quite nice and reassuring to simplify like this where possible. However...
Go on out into the job market advertising your incredible "static page" skills, and what lightening fast load times you'll bring to your employer. Offer to convert their entire 20GB of online content to static XHTML 1.0 Strict to obtain the peace of mind that comes with knowing you haven't introduced any holes yourself. Hell, I'm going to go right now and submit a patch to MediaWiki that generates static versions of every article and then deletes all the PHP from the entire web root! I'm sure as soon as I tell them about the performance boost, they'll be right on board!
Re:All my sites load fast (Score:4, Insightful)
(http://www.cheapcheap.biz/)
Umm... there are plenty of content management systems (say, Cascade [hannonhill.com]) that manage content and publish it out to HTML. Even Dreamweaver's templating system will do this. Just because you use pure HTML, doesn't mean you have to lose out on sitewide management control.
gzip (Score:1)
Re:gzip (Score:4, Insightful)
(http://www.daemonology.net/)
The book would be a lot more believable... (Score:5, Insightful)
Doh! Test yer pages! (Score:4, Insightful)
(http://pages.sbcglobal.net/redelm)
Then every added feature has to be justified -- perceived added value versus cost-to-load. Sure, the artsies won't like you. But it isn't your decision or theirs. Management must decide.
For greater sophistication, you can measure your dl rates by file to see how much is in users caches. And decide whether these are also not a cause of slowness!
Interesting Points... (Score:2, Insightful)
(Last Journal: Thursday June 14, @11:03PM)
Exellent Subject (Score:1)
(http://freejavalectures.googlepages.com/)
It's my general experience (Score:2)
Where is the rule "Avoid Ad-Networks"? (Score:5, Interesting)
(http://blog.fairies-unlimited.net/)
I guess I am not alone in noticing that often the ads on a page drag the load time way down. I find it interesting, that there is no rule about minimizing content dragged in from other servers you have no or little control over. Blind spot because of Yahoo's business, I guess.
The book is about speed, not performance. (Score:4, Interesting)
(http://zesty.ca/)
"Performance" is not a general-purpose synonym for "speed." "Performance" is a much more general term; it can refer to memory utilization, fault tolerance, uptime, accuracy, low error rate, user productivity, user satisfaction, throughput, and many other things. A lot of people like to say "performance" just because it's a longer word and it makes them sound smart. But this habit just makes them sound fake -- and more importantly, it encourages people to ignore all the other factors that make up the bigger picture. This book is all about speed, and the title should reflect that.
So, I beg you: resist the pull of unnecessary jargon. The next time you are about to call something "performance," stop and think; if there's a simpler or more precise word for what you really mean, use it.
Thanks for listening!
Odd Summary (Score:4, Insightful)
(http://thedevilsadvocate.org/)
Let's correct this summary a little bit. First, it's NOVICE Web developers who would think this. Any web developer worth their weight knows the basic idea that java, flash, and other things like it make a PC work hard. The website sends code, but the PC has to execute the code, rather than the website pushing static or dynamic HTML and having it simply render. We bitch and moan enough here on slashdot about flash/java heavy pages, I feel this summary is misdirected as if web developers here didn't know this.
Secondly, there's no argument, so Steve doesn't have to argue with anyone. It's a commonly accepted principle. If someone didn't learn it yet, they simply haven't learned it yet.
Now, I welcome a book like this because #1 it's a great tool for novices to understand the principle of optimization on both the server and the PC, and #2 because it hopefully has tips that even the above average admin will learn from. But I scratch my head when the summary makes it sound like it's a new concept.
Pardon me for nitpicking.
Advertising slows everything down (Score:3, Insightful)
(Last Journal: Tuesday December 30 2003, @09:46PM)
Putting an adblocker of some sort or Mozilla Adblock Plus is a great way to speed up any page (from the user's point of view, of course).
wish it did focus on the backend (Score:2)
Web development is especially bad at optimization. This thread demonstrates the problem:
http://forums.devnetwork.net/viewtopic.php?t=74613 [devnetwork.net]
People there are actually recommending you wait until your server fails before you look to optimize.
Head of site performance at Yahoo, huh? (Score:2, Funny)
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.yahoo.com%2F&charset=(detect+automatically)&doctype=Inline&group=0 [w3.org]
Double WTF:
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.slashdot.org&charset=(detect+automatically)&doctype=Inline&group=0 [w3.org]
Web 2.0 performance costs (Score:5, Insightful)
- Spike
Freeware OpenGL arcade game SOL, competitor in the 2008 Independent Games Festival: http://www.mounthamill.com/sol.html [mounthamill.com]
I guess he's not used the new Yahoo Mail interface (Score:4, Interesting)
(Last Journal: Thursday April 19 2007, @10:15PM)
Still the same with web masters. (Score:3, Insightful)
(http://slashdot.org/ | Last Journal: Thursday February 21 2002, @04:37PM)
Those Web designers should be called "Unemployed"
Outsource everything (Score:2)
So not only do they now outsource the web page designers, they are outsourcing the technical writers?
What's next? Outsource the audience?
ISBN redundancy (Score:3, Informative)
There's no need to list both the ISBN 10 and the ISBN 13. ISBN 13 is a superset of ISBN10. Notice that both numbers contain the exact same 9 data digits:
0596529309
9780596529307
The only difference is the 978 "bookland" region has been prepended, and the check digit has been recalculated (using the EAN/UPC algorithm, instead of ISBN's old algo). You can just give the ISBN 10, or just the ISBN 13. You can trivially calculate one from the other. All software that deals with ISBNs should do this for you. e.g., if you search either the ISBN13 or ISBN10 on amazon, you'll end up at the exact same page.
My advice to speed up your website (Score:1)
It's designed from the ground up as an HTTP accelerator. It's extremly fast, in most cases way faster than Squid. However if you rely a lot on cookies you should look somewhere else.
Ad-Networks (Score:1)
(http://www.linuxdsl.co.uk/)
I have lost countless hours of my life waiting for pages to render while they suck down banner ads from overloaded delivery networks (e.g. Falkag).
Have read, mixed feelings (Score:4, Insightful)
I have looked at the book again now, and there seem to have been some changes. For example, there were only 13 rules when I was reviewing those before. Now there are 14. As one example, ETags were advised to not be used at all (IIRC, my biggest WTF about the book - if used correctly, ETags are marvellous things and compliment 'expires' very nicely), instead of the current 'only use if done correctly'. Some other things are nigh impossible to do correctly crossbrowser (think ETag + GZIP combo in IE6, AJAX caching in IE7, etc). To be honest, I found pretty much all of this stuff being WebDevelopment 101. If you're not at the level that you should be able to figure most of these things out for yourself, you probably won't be able to put them into practise anyway, and you should not be in a place where you are responsible for these things.
I might pick up this book just to read it again, see about the changes and read the full chapters, just to hear the 'other side of the story', but IMHO this book isn't worth it. In all honesty, the only thing I got out of it so far that I didn't know is the performance toll CSS expressions take (all expressions are literally re-evaluated at every mouse move), but I hardly used those anyways (only to fix IE6 bugs), and in response have written a jQuery plugin that does the required work at only the wanted times (and I've told you this now, so no need to buy the book).
My conclusion, based solely on the fairly large number if excerpts I've read is: if you're a beginner, keep this book off for a while. If you're past the beginner stage but your pages are strangly sluggish, this book is for you. If you've been around, you already know all this stuff.
Language Nazi Note (Score:2)
(http://picknit.com/ | Last Journal: Saturday July 29 2006, @03:58PM)
Flash (Score:2)
Some think Flash is essential to the web brousing experience and a site without Flash is not worth the bother.
Others think that a site with flash is sure evidence of a triumph of style over content, and guarantees its not worth waiting for it to load.
Since Adobe choose not to support FreeBSD, its fairly clear that freeBSD users all fall in the second category. You will have to do other analyses yourself.
HP's Website consistently has CRAPPY performance! (Score:1, Offtopic)
(http://slashdot.org/)
No useless intro in flash (Score:1)
(http://www.ofep.be/ | Last Journal: Wednesday June 07 2006, @12:03PM)
Huh? What's that? (Score:2, Funny)
(http://sugarmtnfarm.com/blog/)
Hmm... (Score:1)
(http://www.tuxera.be/)
Errata for sample chapter: gzip vs. deflate (Score:2)
(http://blogs.gnome.org/raphael | Last Journal: Friday September 14 2001, @11:09AM)
I started reading the first chapter and I was surprised when I read the following paragraph:
Unfortunately, it looks like the author does not know what he is talking about, since gzip is based on the deflate algorithm. It is likely that the author did not even look at RFC 1952 [ietf.org] because this is stated clearly in the abstract:
If you look at the HTTP/1.1 definition, you will see that section 3.5 [w3.org] specifies the meanings of the compression types "gzip" and "deflate":
In fact, it is unfortunate that the HTTP/1.1 RFC used the name "deflate" for this content coding, because it would have been more appropriate to name it "zlib". And both "gzip" and "zlib" are based on the deflate algorithm, as you can easily see if you take a quick look at RFC 1952 (gzip) and RFC 1950 (zlib). Both of them have a flag CM for the compression method, and the only one that is defined is CM=8 for the deflate method (RFC 1951). So the HTTP/1.1 RFC is a bit confusing, but a quick look at the corresponding RFCs for gzip and zlib can clear up that confusion easily.
The difference between gzip and zlib (deflate) are minimal. The gzip file header can be a few bytes longer because it includes the file modification time, optional extra fields, the original file name and a file comment. Note that none of these fields are useful for HTTP transfers, because the same information is already included in the HTTP headers (except for the optional comment, which is ignored anyway).
So the statement in the sample chapter saying that deflate (zlib) "is slightly less effective" than gzip is just wrong. It is actually the opposite: the zlib header would be a few bytes shorter than the gzip header. No, the main difference between both formats is that gzip (the format) is more popular because gzip (the program) is popular. And you can easily use the gzip program to pre-compress the static contents on your server so that it does not have to do it on-the-fly all the time. It would be much better if the author would re-write that paragraph and not make it sound like the formats are significantly different when they are based on the same algorithm. Also, the reference to the effectiveness of one or the other is misleading.
In video form... (Score:1)
http://video.yahoo.com/video/play?vid=1040890 [yahoo.com] (Flash Video)
http://us.dl1.yimg.com/download.yahoo.com/dl/ydn/yui/theater/souders-performance.m4v [yimg.com] (M4V)
definitely worth a watch
Re:First post!!!!111one (Score:1)
Bert