Getting uniques from an array (with speed in mind)

Important : the following technique doesn’t work for any arrays where the values are :

• boolean (true/false)
• null
• objects
• resources

Suppose you get data from some source (an XML file, a CSV file, …) and you put it into an array. Now suppose this data is full of duplicates. For example, you have :

array(
0 => 'horse',
1 => 'pig',
2 => 'pig',
3 => 'cow',
4 => 'horse',
...

How can you get the unique values from this array ?

The standard way would be to do :

\$a = array_uniques(\$a);

Works fine, except that it’s extremely slow for large arrays.

A better way would be :

\$a = array_keys(array_flip(\$a));

But marginally faster is :

\$a = array_flip(array_flip(\$a));

So how big is the speed difference ? For large arrays, a double array_flip can easily be 20 times faster.

For reference, here’s a small benchmark :

\$a = array();
for (\$x=0; \$x < 1000000; \$x++) { \$a[] = rand(0,1000); } \$starttime = microtime(true); \$b = array_unique(\$a); echo (microtime(true) - \$starttime) . "\n"; \$starttime = microtime(true); \$b = array_keys(array_flip(\$a)); echo (microtime(true) - \$starttime) . "\n"; \$starttime = microtime(true); \$b = array_flip(array_flip(\$a)); echo (microtime(true) - \$starttime) . "\n";

The result :

2.06489086151
0.101167201996
0.0999970436096

This site uses Akismet to reduce spam. Learn how your comment data is processed.