All posts by wimg

How a bad favicon.ico can cause a lot of trouble

Favicon.ico is a nice thing, but it can cause a whole lot of trouble when missing or not used properly…

 

What’s favicon.ico ?
Favicon on google.com

Favicon.ico is a Microsoft-invented icon that shows the logo for the Website in the browser’s address bar and next to the site name in the browser’s bookmarks. It was first added to Internet Explorer 4 in 1997 and has since been adopted by all browsers.

Since tabbed browsing was introduced, it’s used as the icon for the tabs as well.

 

So where is the file ?
A browser will, by default, look for it in the site’s root directory. So for http://www.google.com, that’s http://www.google.com/favicon.ico
However, its location can also be specified within the XHTML (of each page) by using one of the following :

(Last 3 not supported in Internet Explorer)

 

The not-so-catastrophic problems

There’s a number of problems associated with favicon.ico – the not-so-catastropic ones are :

  • Some favicon.ico files are located on a different URL and use redirects. This means the browser has to make multiple requests to get to the right location. It also means your server gets multiple hits.
    Example : www.wordpress.com/favicon.ico redirects to wordpress.com/favicon.ico, which redirects to en.wordpress.com/favicon.ico, which redirects to www.gravatar.com/blavatar/4e21d703d81809d215ceaabbf07efbc6?s=16&d=http://s2.wp.com/i/favicon.ico, which finally serves the icon – that’s 4 connections and 4 requests for an icon file
  • Some sites don’t send the correct mime type when sending the icon. The acceptable mime types are image/x-icon, image/vnd.microsoft.icon, image/png and image/gif. However some just send application/octet-stream or even text-plain. Most browsers seem to have no problem with this, because they use the extension to attempt to parse the type, but it goes against best practices.
    Examples :
    – wordpress.org and thepiratebay.org send an application-octet-stream header
    – ups.com sends a text/plain content-type header, but sends an icon file along – very bad practice !

 

The really bad ones

  • Some sites use real icon files, but they’re extremely large, although there’s really no good reason for it.
    Examples :
    – The biggest icon file for sites in the Alexa top 20.000 is www.marketingsherpa.com, providing a 554KByte file… Based on the fact they get about 2.7M pageviews per month (Alexa estimate), we can guestimate they’ll be sending out quite a few GBytes (50 ? 100 ?) of data every month !
    – Flickr.com (Alexa #33) sends a 90KByte .ico file (still over 1100 times larger than the smallest possible icon)
    – WordPress.com has an 11KByte .ico file
  • By far the most common problem is a missing favicon.ico file. Although that might not seem like a big problem, it can actually cause massive issues on a high-traffic site.
    Imagine this : if you get 10 pageviews/sec on your server (which is not that much) and your favicon.ico file doesn’t exist, your server will generate a 404 error for every first request. Luckily, browsers such as Firefox 3+ keep a list of which favicons are missing and don’t re-request them, but not all browsers follow this behaviour, meaning if those 404 pages aren’t cached, the icon is requested again on every pageview.
  • Let’s make it worse : if you’re using a framework like Zend Framework and you’re redirecting all requests to your framework bootstrap, you might be sending all 404 errors to the bootstrap, so you can show a fancy We’re sorry, that page doesn’t exist or even a page with Did you mean … where you do a search query for potential matches. So what happens when favicon.ico doesn’t exist and hits that search on every request to your site ? Exactly : you get 2 pageviews for every real pageview… and each pageview launches your entire framework bootstrap and in case you’re doing the search thing, it launches a search on your backend DB… ouch !
    Example :
    go.com favicon 404
    go.com sends a 22Kbyte page with Oops! We’re sorry, but we’re having technical problems. – luckily most subdomains (such as disney.go.com) do have a favicon.ico – otherwise the 46th largest Website in the world would have had quite a bit of traffic and load because of a missing file
  • Some sites use png or gif files, often the site’s main logo. Although using png or gif is supported by most browsers and in fact using png will produce the smallest possible icon files (see below), it’s not supported by Internet Explorer. Also, using your company’s main logo image file isn’t the right thing to do, since those files are usually quite large, which means the browser needs to resize the image to a 16x16px or 32x32px image. This doesn’t just use processing power, but it also means the image being sent is a lot larger than required.
  • Some sites will use all of the XHTML link tags, causing the browser to download the icon multiple times, especially when each tag refers to a different location (i.e. on a CDN network).

 

Who’s doing it wrong ?
To give you some idea of other big sites doing it wrong :

  • hp.com returns an application/octet-stream
  • aws.amazon.com uses the link tag implementation, but uses a malformed URL
  • citibank.com (and citi.com and many other Citibank domains) displays a 404 page, adding 15KByte. And since they’re using quite a few subdomains, the icon is requested a lot of times. (Note : online.citibank.com does have an icon, so why not copy it to the other subdomains ?)
  • apc.com (the UPS brand) shows a 404

 

Some are doing it right

  • facebook.com : 152 bytes with 0 redirects
  • yahoo.com : 318 bytes with 0 redirects
  • ibm.com : 318 bytes with 0 redirects

 

Some of the big shots can do better

It’s actually remarkable to see that sites like Google, Live, Twitter, LinkedIn, AOL, Adobe and Myspace (to name just a few) send out a 1150 byte icon.
Given that Google has tried everything to skim down its main page (including removing </body> and </html> tags, it’s odd they didn’t save the 239 bytes by creating a PNG file and providing that PNG to all non-IE clients (multiply it by 100 million or so hits/day and you get a nice 23TBytes…).

 

A word of advice

It’s quite simple actually : use a favicon.ico file on all your subdomains. If you don’t have an properly created icon for your site, put an empty icon or even just a plain empty 0 bytes file (be careful though, not all browsers like this and will request it over and over again).
In case you’re looking for a small blank icon file, I created a 79 bytes favicon.ico file (actually a PNG, so it won’t work on Internet Explorer) : here you go – they don’t get smaller than this !

In case you want an IE-compatible one : smallest favicon.ico

phpBenelux : conference done – slides up – webcast coming soon

phpBenelux 2011 was a huge success. After last year’s one-day conference the phpBenelux team decided to add a half day of conference and add a half day of tutorials as well. I wasn’t able to attend many of the talks, but heard a lot of good things about the talks, the food, the atmosphere, etc.

On Friday, I presented a tutorial called ‘Caching and Tuning fun for high scalability’ at the phpBenelux Conference. The slides are now available here (Slideshare). If you were at my talk, please rate it here (Joind.in). Since it was the first time I gave this talk, any feedback would be most welcome.

Sadly, because of issues with servers (2 disk failures and a burnt out CPU), I was unable to present the planned live benchmark, so I will do a webcast in March in which I will go through the step-by-step process of getting a site from having no performance and no scalability to the point where it can scale way beyond Slashdot-effect handling levels. That way, people who follow the webcast can see for themselves what effect each of the changes (adding the different kinds of caching, distributing the cache, adding Nginx, adding reverse proxies, tuning the DB, tuning the Webserver and ofcourse tuning the frontend) has on the performance and the scalability of the Website. It will definitely be an eye-opener for a lot of people !

For an exact date, check back here in Feburary or follow me on Twitter @wimgtr

Presenting “Caching and Tuning Fun for High Scalability” at phpBenelux

On Jan 28-29 the second phpBenelux conference will be held in Edegem, a small town near Antwerp, Belgium. phpBenelux is the largest php usergroup for the Benelux (Belgium-Netherlands-Luxemburg) area.

I’ve been given the opportunity to give a 3-hour tutorial on the 28th. Here’s a quick rundown of what I will be diving into during that tutorial :
– General introduction into caching (what it is, what types of caching techniques there are, etc.)
– Some common caching implementations (the good, the bad and the ugly)
– How to keep your site scalable by caching
– How to bring down your site (how NOT to cache !) – i.e. commonly used techniques that lead to disaster
– Caching outside the box
– Spread, don’t duplicate
– Scalable alternatives to LAMMP (no, not a typo !)
– Applying Varnish
– Backend done, let’s go frontend !
– Tuning quickies
– Finally, a live demo with all of the described techniques applied while performing a stress-test – things might (will) most likely go horribly wrong here 😉

I’m looking forward to presenting this tutorial. If you want to be there, visit the phpBenelux Conference site or order your ticket while you still can !

See you all on the 28th !

Using PHPUnit to verify parameter types (revisited)

(This is an update on a blog post I wrote last year about parameter type checking)

PHP is dynamically typed

PHP is a dynamically typed language. What this means is that it allows you to do things like :

$a = 5;
$a = 'test';
$a = false;

The reason this works, is because PHP enforces type rules during execution, not at compile-time.

In many other languages this is impossible, since you need to define a type for the variable at compile-time. Languages such as Java, C, C++, C# and VB.Net are good examples of statically typed languages.

Problems with dynamic typing

Although dynamic typing is considered to be one of PHP’s strong suites, it does pose some problems. Let me illustrate with an example :

Suppose we have a piece of code that processes the amount of money each employee must be paid. Employees can file expense notes that are paid back in cash or when their monthly wages are paid. Our code will make the calculation for the pay check.

The data for our 4 employees is located in a CSV-file, made available from an external source :

employeeId, firstname, lastname, wage, expenses, processexpenses
1, Claire, Clarckson, 2000, 212, 0
2, Tom, Whitney, 1910, 111, 0
4, Jules, verne, 1932, 98, 1
5, Gregory, Jameson, 2131, 241, 0

If the last field is true, the expenses must be added to wage amount on the paycheck. So our code might look like this (don’t pay attention to code quality, it’s an example of ‘the average piece of code you will find’) :

class Wages
{
/**
* Process the wages
* @return boolean
*/
public function processWages()
{
$handle = fopen('some-file.csv', 'r');
while (($data = fgetcsv($handle, ',')) !== false) {
if (is_numeric($data[0])) {
$result = $this->processLine($data[2], $data[3], $data[4]);
$this->sendPaycheck($data[0], $result);
}
}
return true;
}

/**
* Calculate wages based on processExpenses parameter
*
* @param float $wage
* @param float $expenses
* @param boolean $processExpenses
* @return float
*/
private function processLine($wage, $expenses, $processExpenses)
{
if ($processExpenses) {
return $wage + $expenses;
} else {
return $wage;
}
}

/**
* Pay the employee
*
* @param int $id
* @param float $amount
*/
private function sendPaycheck($id, $amount)
{
echo 'Paycheck for id ' . $id . ' for the amount of : ' . $amount ."\n";
}
}

Everything works fine, until the external source decides (probably unknowingly) to modify the data format to :

employeeId, firstname, lastname, wage, expenses, processexpenses
1, Claire, Clarckson, 2000, 212, false
2, Tom, Whitney, 1910, 111, false
4, Jules, verne, 1932, 98, true
5, Gregory, Jameson, 2131, 241, false

When we run the code, everyone will be paid their expenses, even those that have ‘false’ in the last field. The reason ? The last field of each line might look like a boolean, but is in fact a string. The “false” is read as a string and is boolean true.

You might say that we didn’t follow best coding practices in our :

if ($processExpenses) {

which should have been

if ($processExpenses === true) {

but that would only have reversed the effect : nobody would have been paid.

Similar non-boolean situations cause the same problem. There’s a huge list of problems that might be caused by passing incorrect types to a function.

Granted, we should have put a type check in place, but as I said this was the average type of code you will find. And it’s exactly this type of code I wanted to use for this demo.

So what’s the solution ?

Since we don’t want to give up on our dynamic typing, we need a way to verify that parameters being passed to a function/method are of the type that we intend them to be. That way, anyone who wishes to use our function/method will be forced to pass the right parameter.

One solution would be to use type safe objects like the ones described by Sebastian Bergman (author of PHPUnit) in his Type-Safe Objects in PHP presentation. However, this is unusable for existing projects as it requires a massive rewrite. Furthermore, as Sebastian indicates, it poses a lot of new problems. And finally, it slows things down quite a bit, since it uses reflection to verify types during execution…

Another solution would be to have type hinting for all types, including scalar types, in PHP. Although proposed and agreed to by many, the current concensus (as of today at least) is to not include it PHP 5.4. It might end up in a branch for future use or might end up as a PHP extension down the line, but for now it seems to be off the table for PHP 5.4.

So should or shouldn’t we check the type of a parameter before using it ?

Another big dillema : should you check each parameter’s type in each single function at runtime using is_int, is_bool, etc. ? Some would say it’s the safest way and the only way to be absolutely sure.

I believe there’s a different and better approach : if you can integrate type checking in your unit testing and have a high code coverage percentage, there’s no need to explicitely check the type during runtime, except when handling external data.

So how do we make sure types are checked ?

What system is better suited for the job than the most popular testing PHP framework, PHPUnit ?

Since PHPUnit run can be repeated over and over again and introducing additional checks will not cause performance issues on the actual production environment, this is a good place to add these checks.

Upon execution of each PHPUnit Testcase, this patch will verify parameters for each of the called functions/methods. You can define the depth of calls using a PHPUnit command line parameter (the default is 2, which means functions called from the testcase itself and functions called from those functions).

How does the system know what types are expected if we don’t have type hints ?

The system presumes that if you’re using PHPUnit, you clearly know proper development methods. This also means you’ll be using docblocks to comment your functions.

So, since there are no type hints to rely on, it will instead rely on the types you specify in the docblock

It will analyze the docblock of each function/method and compare each parameter type with the expected parameter type. If it finds an inconsistency, it will produce a PHPUnit warning.

So does it support…

– Classes : yes

– Interfaces / Abstract classes / parent classes : yes, in fact if you specify an interface or abstract/parent class in the docblock and pass a class implementing/extending them, it will detect it as a valid type

– Multiple types definitions in the docblock : yes – separate them by a pipe (|)

– Return value types : yes – needs Xdebug patch, see below

Is it perfect ?

Nothing is… there’s a few problems at this point :

– If you want to analyze return values, you’ll need a patch for Xdebug I wrote last week. You can download that patch here : XDebug bug #416 patch

– It still needs a bit of tuning… it’s a work in progress !

The MySQL problem : something most people don’t know

Data from any external source might cause problems. MySQL is the best example : any data being returned from MySQL using the non-binary protocol is a string, even if the column is defined as int, decimal, bool (tinyint), …

MySQL’s protocol returns all data as a string and the PHP mysql and mysqli extensions don’t convert it into the expected datatype. The result is that any data from MySQL will be passed as a string, which can cause havoc when doing type checks. The only exception is when using mysqlnd and prepared statements (see the second example of Scalar type hints in PHP trunk).

There are 3 solutions to this problem :

  • Cast everything : not really fun, since you’ll need to change all your code. It might also be bad for performance, although it does set the types right ofcourse. Might be the best option for new code.
  • Use Propel or some other database layer, which does the casting for you… same performance problem ofcourse.
  • Wait until someone makes changes to the way MySQL and PHP communicate…

In the meantime, if you want to use the type checking, but you have some problems with MySQL, you can use a docblock tag to disable type checking for some functions : @phpunit-no-type-check

How to run it

After applying the patch to PHPUnit (and Xdebug if you want return type checking), run it like this :

php [path to phpunit.php] --check-param-types [TestCase.php|TestSuite.php]

Optional parameter :

–check-param-type-depth= sets the depth to which it needs to check parameter types. Your test is depth 0, any called function within your test is 1, etc. – default is 2 although 3 might be handy too

The output

This is the kind of output you can expect :

2) ATest::testMultiply

Invalid type calling A->multiply : parameter 2 ($factor2) should be of type int but got string(1) instead in C:\development\Test\ATest.php:42

Note that if you use the ‘–check-param-type-depth’ parameter and set it to a high number, you might see errors about libraries you use. Ofcourse, that might be the right moment to notify the library author (or contribute a fix yourself !)

Advantages

  • No need to use is_int, is_bool, etc. in a function that was type-checked
  • Consistent use of types
  • Developers will learn to focus not just on the content of a variable, but on the type as well. In time, it will become second nature to use the correct types from the start. Code hinting in most IDEs should in fact already help out with this, but now there’s a way to verify this too.
  • Verification of each function/method call, not just in terms of functionality (PHPUnit’s job), but also in terms of data types
  • Forces developers to keep the docblock up-to-date (!)

Using type checking basically brings the best of the dynamically and statically typed worlds together : you still have the flexibility of dynamic typing, but assurance that functions are called with the parameter types they were designed to be called with (as well as return the correct types). It’s the perfect middle-of-the-road approach for teams with a mix of ‘strict’ and ‘not-so-strict’ developers.

Where to get it

The PHPUnit modification can be downloaded through a Github-fork : http://github.com/wimg/phpunit

Possible extension

There’s plenty of things that could be added.

One of them is “non-strict” option, which ignores type conversion between types as listed on http://wiki.php.net/rfc/typecheckingstrictandweak (option 2 section)

Feedback

As always, feedback is much appreciated, as a comment in this blog or via e-mail.

Automated PHP 5.3 compatibility testing for your (old) code

Update (Mar 5, 2012) : check here for the latest version, supporting PHP 5.4 as well.

Update (Dec 22, 2010) : code has seen some minor modifications to ensure compatibility with the latest PHP_CodeSniffer release (1.3.0RC1) – thanks to Sebastian Bergmann. Also updated the instructions below.

Note (Dec 22, 2010) : this compatibility test will also test all testable cases for 5.0, 5.1 and 5.2

So you or your team has built anywhere between 5 and 500 projects in PHP 4, 5.1 and 5.2 over the past 5 years. And now PHP 5.3 is there, offering a lot of very interesting features, including namespace support, late static binding (finally !), closures, nested exceptions and a bunch more (see the new feature list).

So naturally, you’d like to upgrade. But doing so might break some old code. Why ? Because of some backward incompatibilities :

  • New reserved keywords (goto, namespace)
  • Deprecated functions that will throw an error (the brand new E_DEPRECATED error in fact !)
  • Call-by-value on functions with by-reference parameters will now raise a fatal error
  • and again… many more (see the list)

So how do you ensure your code is PHP 5.3 ready ? Well, there’s a few options.

Option 1 : run your unit tests

You just knew I was going to say that, didn’t you ? Yes, unit tests are still the best way to test the inner working of your code. Although even 100% code coverage will not guarantee a bugfree system ofcourse (some bugs are by design, others by neglect, others…)

Option 2 : test your application

Seems logical, doesn’t it ? Install PHP 5.3 on a separate environment or on your test environment and test the entire application. Ofcourse there are issues for some :

  • Old projects often don’t have any budget allocated for this kind of tedious testing
  • Testing an old project is not easy if the original developers aren’t around anymore… so the testing is best done by the actual user… but you don’t want users to see how their application breaks “because of old code”
  • If you have a lot of projects, testing them one-by-one could take a while… maybe 5.4 will be out by then :p

Option 3 : automate your PHP 5.x compatibility tests

Although the first 2 options are really required to ensure your code is PHP 5.3 ready, using automated tests can get you a long way in detecting deprecated functions, unsupported extensions, etc.

Because I’m in charge of moving about 50 projects to PHP 5.3 in the next few weeks/months, I decided to make at least part of this tedious task a little smoother (and faster).

To use the system, all you need is PHP_CodeSniffer and a new sniff standard I created. You will see errors and warnings popping up if part of the code is not PHP 5.3 compatible.

What’s being tested

  • Deprecated functions
  • Deprecated php.ini directives set via ini_set() or retrieved via ini_get()
  • Deprecated assigning of the return value of new() by reference
  • Prohibited names in function name, class name, namespace name and constant name
  • Magic methods can no longer be private, protected or static
  • Removed, unsupported or deprecated extensions
  • All of the above is being tested for PHP 5.0, 5.1, 5.2 and 5.3 compatibility issues

How to download and install

~ > git clone git://github.com/wimg/PHP53Compat_CodeSniffer.git PHP53Compatibility

  • Copy the PHP53Compatibility directory to {your pear path}/PHP/CodeSniffer/Standards

How to run

Start PHP_CodeSniffer like this :

phpcs --standard=PHP53Compatibility

Sample output

FILE: C:\temp\bla.php
--------------------------------------------------------------------------------

FOUND 15 ERROR(S) AND 2 WARNING(S) AFFECTING 12 LINE(S)
--------------------------------------------------------------------------------

4 | ERROR | Function name, class name, namespace name or constant name can
| | not be reserved keyword 'goto' (since version 5.3)
6 | ERROR | Extension 'dbase' is not available in PHP 5.3 anymore
12 | ERROR | Function name, class name, namespace name or constant name can
| | not be reserved keyword 'const' (since version all)
12 | ERROR | Function name, class name, namespace name or constant name can
| | not be reserved keyword 'const' (since version all)
12 | ERROR | Function name, class name, namespace name or constant name can
| | not be reserved keyword 'const' (since version all)
12 | ERROR | Function name, class name, namespace name or constant name can
| | not be reserved keyword 'const' (since version all)
12 | ERROR | Function name, class name, namespace name or constant name can
| | not be reserved keyword 'const' (since version all)
14 | ERROR | Function name, class name, namespace name or constant name can
| | not be reserved keyword 'do' (since version all)
16 | ERROR | Function name, class name, namespace name or constant name can
| | not be reserved keyword 'goto' (since version 5.3)
18 | ERROR | Function name, class name, namespace name or constant name can
| | not be reserved keyword 'namespace' (since version 5.3)
20 | ERROR | Assigning the return value of new by reference is deprecated in
| | PHP 5.3
31 | ERROR | Magic methods must be public (since PHP 5.3) !
31 | ERROR | Magic methods can not be static (since PHP 5.3) !
36 | ERROR | Extension 'mhash' is not available in PHP 5.3 - use the 'hash'
| | extension instead
42 | ERROR | Extension 'msql' is not available in PHP 5.3 anymore
48 | WARNING | The use of function magic_quotes_runtime() is discouraged
50 | WARNING | The use of ini directive 'safe_mode' is discouraged
--------------------------------------------------------------------------------

Some important notes about the system

  • The system checks for deprecated functions, new reserved keywords and other changes from PHP 5.0 to 5.3. However, it doesn’t check for every incompatibility, only a subset that was easily testable using PHP_CodeSniffer. So you still need to check your application manually to see if it runs properly. However, at least part of the job has been made a little easier.
  • You need to run the tests on a system with PHP 5.3 installed (sounds logical, but seemed like a good idea to mention it…)
  • The tests were written on a sunny afternoon with lots of interruptions, so they’re all but perfect. Please let me know if you find bugs, things missing or just want to flame me 😉

As always, no guarantees that it will do the job… but if you feel it’s of use to you, let me know in the comments !