Category Archives: scalability

Zendcon, ACL talk, conferences and other stuff

ACL talk (Zend Webinar)
Remember I promised to post the code of my ACL Webinar somewhere in August ? That didn’t really happen, partly because of a lack of time, partly because after my initial hard drive crash (which made me lose my slides and code), I had another crash in August and then my boot SSD drive crashed in September. Dell was kind enough to replace the power supply, motherboard and 1 disk, but my data was lost (unless I pay over 500EUR to have it recovered, which is a bit too pricy for my liking).
So all those setbacks caused a lot of delay on my promise. Nevertheless, I presented the talk again at the Zendcon Unconference, also mentioning the plans I have in store for version 2. But ofcourse, I’ll have to release version 1 first. Currently I’m using DHTMLX Treeview for the backend treeview interface, but I’m not allowed to redistribute the commercial version I bought. So as soon as I can replace the backend treeview interface with a free one, I’ll release the entire code, including instructions on how to set it up. And since development for version 2 is already underway, I want to make sure I make a good choice there 😉

Zendcon 2011
Last week I spent a few days at Zendcon in Santa Clara, CA. I saw lots of interesting session there and presented 2 sessions during the Zendcon Unconference (community style version of the main conference) :

  • Creating dynamic ACLs in Zend Framework : the Zend Webinar I presented in August
  • Scaling dynamic sites like static sites : a first glimpse on a new Nginx module we’re building to make dynamic sites behave more like static sites in terms of scalability, without losing their dynamic nature

I received some encouraging comments, so I’m looking forward to presenting more on these topics in the next few months. The Nginx talk should also have some real-world benchmarks the next time I present it.
If you saw either one of my talks, please rate it at Joind.in

Next talks
I’m scheduled to talk at 2 more conferences this year :

  • T-Dose (Technical Dutch Open Source Event) in Eindhoven, The Netherlands on Nov 5-6, 2011 :
    • Nov 5 @ 11:00 : Caching and Tuning fun for high scalability – the talk I presented at phpBenelux, Dyncon and FrOSCon, this time in a condensed 50 minute version – this talk discusses the techniques you can use to keep your site running when it goes from 5 to 5 million visitors/day
    • Nov 6 @ 12:00 : Beyond the code : it’s not (just) about the code ! – a brand-new talk that’s aimed at 80% of developers. Short summary : “Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.”
  • PHP Tour in Lille, France on Nov 24-25, 2011 : Caching and Tuning fun for high scalability – the same talk as on T-Dose, in a condensed version

Working on…
In the next few weeks, expect a few new posts about :

  • A cool new IPv6 project we’ll launch soon
  • Some Zend Framework 2 news and stuff
  • More news on the Nginx module we’re building. At Zendcon, there were 3 large PHP hosters who were interested in testing the solution, so you can expect more posts on that.

Talks done and talks to come

I haven’t really been updating my blog lately because of time constraints. I do have a few interesting topics to write about, although some of those have already been converted to talks I want to give at conferences over the next few months (if I get accepted ofcourse).

About those talks : I gave a tutorial called ‘Caching and tuning fun for high scalability’ at phpBenelux in January. It was my first conference talk and I enjoyed it a lot.
So much in fact, that I submitted the same talk for Dyncon2011 in Stockholm in March, where I gave a slightly shorter, but probably much better talk (learned from some of the mistakes I made in the first talk). Slides of this talk can be found at Slideshare although waiting for the video recording might be better, since there’s a lot of stuff not being shown on the slides (such as the live benchmarks).
If you were there, feel free to rate my talk at Joind.in : phpBenelux or Dyncon.

Mid-June I gave a Zend Webinar titled ‘Creating fast, dynamic ACLs in Zend Framework’, where I discussed alternatives to Zend_Acl using reflection and caching. I was happy to see more than 80 people watching the live webinar and got lots of interesting questions. The video is available on the Zend Webinar site. Any feedback most welcome via SpeakerRate.
I promised to publish the code for the ACL implementation and that’s still on my todo list. I hope to be able to do this by the end of July (don’t want to make unrealistic promises…).

In the coming months, I’ll be giving the caching and tuning tutorial at FrOSCon (with a new bonus feature near the end… a first glimpse of a new feature for Nginx) near Bonn, Germany in August and at PHP Tour in Lille, France in November.

I’m not entirely sure if I’ll attend Zendcon 2011 yet. Although I promised Michelangelo Van Dam to help out with the Unconference last year, I had to cancel my trip due to my grandmother passing away on the morning of departure. Maybe this year I can actually keep that promise (if I can find the budget).

How a bad favicon.ico can cause a lot of trouble

Favicon.ico is a nice thing, but it can cause a whole lot of trouble when missing or not used properly…

 

What’s favicon.ico ?
Favicon on google.com

Favicon.ico is a Microsoft-invented icon that shows the logo for the Website in the browser’s address bar and next to the site name in the browser’s bookmarks. It was first added to Internet Explorer 4 in 1997 and has since been adopted by all browsers.

Since tabbed browsing was introduced, it’s used as the icon for the tabs as well.

 

So where is the file ?
A browser will, by default, look for it in the site’s root directory. So for http://www.google.com, that’s http://www.google.com/favicon.ico
However, its location can also be specified within the XHTML (of each page) by using one of the following :

(Last 3 not supported in Internet Explorer)

 

The not-so-catastrophic problems

There’s a number of problems associated with favicon.ico – the not-so-catastropic ones are :

  • Some favicon.ico files are located on a different URL and use redirects. This means the browser has to make multiple requests to get to the right location. It also means your server gets multiple hits.
    Example : www.wordpress.com/favicon.ico redirects to wordpress.com/favicon.ico, which redirects to en.wordpress.com/favicon.ico, which redirects to www.gravatar.com/blavatar/4e21d703d81809d215ceaabbf07efbc6?s=16&d=http://s2.wp.com/i/favicon.ico, which finally serves the icon – that’s 4 connections and 4 requests for an icon file
  • Some sites don’t send the correct mime type when sending the icon. The acceptable mime types are image/x-icon, image/vnd.microsoft.icon, image/png and image/gif. However some just send application/octet-stream or even text-plain. Most browsers seem to have no problem with this, because they use the extension to attempt to parse the type, but it goes against best practices.
    Examples :
    – wordpress.org and thepiratebay.org send an application-octet-stream header
    – ups.com sends a text/plain content-type header, but sends an icon file along – very bad practice !

 

The really bad ones

  • Some sites use real icon files, but they’re extremely large, although there’s really no good reason for it.
    Examples :
    – The biggest icon file for sites in the Alexa top 20.000 is www.marketingsherpa.com, providing a 554KByte file… Based on the fact they get about 2.7M pageviews per month (Alexa estimate), we can guestimate they’ll be sending out quite a few GBytes (50 ? 100 ?) of data every month !
    – Flickr.com (Alexa #33) sends a 90KByte .ico file (still over 1100 times larger than the smallest possible icon)
    – WordPress.com has an 11KByte .ico file
  • By far the most common problem is a missing favicon.ico file. Although that might not seem like a big problem, it can actually cause massive issues on a high-traffic site.
    Imagine this : if you get 10 pageviews/sec on your server (which is not that much) and your favicon.ico file doesn’t exist, your server will generate a 404 error for every first request. Luckily, browsers such as Firefox 3+ keep a list of which favicons are missing and don’t re-request them, but not all browsers follow this behaviour, meaning if those 404 pages aren’t cached, the icon is requested again on every pageview.
  • Let’s make it worse : if you’re using a framework like Zend Framework and you’re redirecting all requests to your framework bootstrap, you might be sending all 404 errors to the bootstrap, so you can show a fancy We’re sorry, that page doesn’t exist or even a page with Did you mean … where you do a search query for potential matches. So what happens when favicon.ico doesn’t exist and hits that search on every request to your site ? Exactly : you get 2 pageviews for every real pageview… and each pageview launches your entire framework bootstrap and in case you’re doing the search thing, it launches a search on your backend DB… ouch !
    Example :
    go.com favicon 404
    go.com sends a 22Kbyte page with Oops! We’re sorry, but we’re having technical problems. – luckily most subdomains (such as disney.go.com) do have a favicon.ico – otherwise the 46th largest Website in the world would have had quite a bit of traffic and load because of a missing file
  • Some sites use png or gif files, often the site’s main logo. Although using png or gif is supported by most browsers and in fact using png will produce the smallest possible icon files (see below), it’s not supported by Internet Explorer. Also, using your company’s main logo image file isn’t the right thing to do, since those files are usually quite large, which means the browser needs to resize the image to a 16x16px or 32x32px image. This doesn’t just use processing power, but it also means the image being sent is a lot larger than required.
  • Some sites will use all of the XHTML link tags, causing the browser to download the icon multiple times, especially when each tag refers to a different location (i.e. on a CDN network).

 

Who’s doing it wrong ?
To give you some idea of other big sites doing it wrong :

  • hp.com returns an application/octet-stream
  • aws.amazon.com uses the link tag implementation, but uses a malformed URL
  • citibank.com (and citi.com and many other Citibank domains) displays a 404 page, adding 15KByte. And since they’re using quite a few subdomains, the icon is requested a lot of times. (Note : online.citibank.com does have an icon, so why not copy it to the other subdomains ?)
  • apc.com (the UPS brand) shows a 404

 

Some are doing it right

  • facebook.com : 152 bytes with 0 redirects
  • yahoo.com : 318 bytes with 0 redirects
  • ibm.com : 318 bytes with 0 redirects

 

Some of the big shots can do better

It’s actually remarkable to see that sites like Google, Live, Twitter, LinkedIn, AOL, Adobe and Myspace (to name just a few) send out a 1150 byte icon.
Given that Google has tried everything to skim down its main page (including removing </body> and </html> tags, it’s odd they didn’t save the 239 bytes by creating a PNG file and providing that PNG to all non-IE clients (multiply it by 100 million or so hits/day and you get a nice 23TBytes…).

 

A word of advice

It’s quite simple actually : use a favicon.ico file on all your subdomains. If you don’t have an properly created icon for your site, put an empty icon or even just a plain empty 0 bytes file (be careful though, not all browsers like this and will request it over and over again).
In case you’re looking for a small blank icon file, I created a 79 bytes favicon.ico file (actually a PNG, so it won’t work on Internet Explorer) : here you go – they don’t get smaller than this !

In case you want an IE-compatible one : smallest favicon.ico

phpBenelux : conference done – slides up – webcast coming soon

phpBenelux 2011 was a huge success. After last year’s one-day conference the phpBenelux team decided to add a half day of conference and add a half day of tutorials as well. I wasn’t able to attend many of the talks, but heard a lot of good things about the talks, the food, the atmosphere, etc.

On Friday, I presented a tutorial called ‘Caching and Tuning fun for high scalability’ at the phpBenelux Conference. The slides are now available here (Slideshare). If you were at my talk, please rate it here (Joind.in). Since it was the first time I gave this talk, any feedback would be most welcome.

Sadly, because of issues with servers (2 disk failures and a burnt out CPU), I was unable to present the planned live benchmark, so I will do a webcast in March in which I will go through the step-by-step process of getting a site from having no performance and no scalability to the point where it can scale way beyond Slashdot-effect handling levels. That way, people who follow the webcast can see for themselves what effect each of the changes (adding the different kinds of caching, distributing the cache, adding Nginx, adding reverse proxies, tuning the DB, tuning the Webserver and ofcourse tuning the frontend) has on the performance and the scalability of the Website. It will definitely be an eye-opener for a lot of people !

For an exact date, check back here in Feburary or follow me on Twitter @wimgtr