Getting uniques from an array (with speed in mind)

Important : the following technique doesn’t work for any arrays where the values are :

  • boolean (true/false)
  • null
  • objects
  • resources

Suppose you get data from some source (an XML file, a CSV file, …) and you put it into an array. Now suppose this data is full of duplicates. For example, you have :

array(
0 => 'horse',
1 => 'pig',
2 => 'pig',
3 => 'cow',
4 => 'horse',
...

How can you get the unique values from this array ?

The standard way would be to do :

$a = array_uniques($a);

Works fine, except that it’s extremely slow for large arrays.

A better way would be :

$a = array_keys(array_flip($a));

But marginally faster is :

$a = array_flip(array_flip($a));

So how big is the speed difference ? For large arrays, a double array_flip can easily be 20 times faster.

For reference, here’s a small benchmark :

$a = array();
for ($x=0; $x < 1000000; $x++) { $a[] = rand(0,1000); } $starttime = microtime(true); $b = array_unique($a); echo (microtime(true) - $starttime) . "\n"; $starttime = microtime(true); $b = array_keys(array_flip($a)); echo (microtime(true) - $starttime) . "\n"; $starttime = microtime(true); $b = array_flip(array_flip($a)); echo (microtime(true) - $starttime) . "\n";

The result :

2.06489086151
0.101167201996
0.0999970436096

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.