Getting uniques from an array (with speed in mind)

Important : the following technique doesn’t work for any arrays where the values are :

boolean (true/false)
null
objects
resources

Suppose you get data from some source (an XML file, a CSV file, …) and you put it into an array. Now suppose this data is full of duplicates. For example, you have :

array( 0 => 'horse', 1 => 'pig', 2 => 'pig', 3 => 'cow', 4 => 'horse', ...

How can you get the unique values from this array ?

The standard way would be to do :

$a = array_uniques($a);

Works fine, except that it’s extremely slow for large arrays.

A better way would be :

$a = array_keys(array_flip($a));

But marginally faster is :

$a = array_flip(array_flip($a));

So how big is the speed difference ? For large arrays, a double array_flip can easily be 20 times faster.

For reference, here’s a small benchmark :

$a = array(); for ($x=0; $x < 1000000; $x++) { $a[] = rand(0,1000); } $starttime = microtime(true); $b = array_unique($a); echo (microtime(true) - $starttime) . "\n"; $starttime = microtime(true); $b = array_keys(array_flip($a)); echo (microtime(true) - $starttime) . "\n"; $starttime = microtime(true); $b = array_flip(array_flip($a)); echo (microtime(true) - $starttime) . "\n";

The result :

2.06489086151
0.101167201996
0.0999970436096

Wim Godden professional blog

Getting uniques from an array (with speed in mind)

Leave a Reply Cancel reply

Technical and business stuff for the open-minded