PHP SPL data structure: SplFixedArray

PHP 5.3 introduced some new data structures. The talk of Jurriën Stutterheim on PFCongres 2011 on SPL structures and their performance triggered me to have a closer look at the performance of these structures. I was kind of fooled by two comments on the PHP.net page, so it was time to find out myself.

For people familiar with Java the SplFixedArray is not a strange structure as they are common structures in this language. PHP only used arrays in the past.

The common array structure may contain all kind of keys (one can use strings, integers and even combine them) and under the hood PHP uses a hashing algorithm to create a unique array index for these keys. So actually under the hood these arrays are comparable to Java HashMaps. Like a database can create indexes on keys, PHP creates an index on the hashed indexes to speed up item retrieval from the array. However: hashing does not guarantee a unique output for every unique input. It may well be that two completely different keys result in a same hashed value. A hashing algorithm can sort this out in its own way (there are various way to do this, but I think this is out of the scope of this post right now), so there's another level of complexity here.

The new SplFixedArray has a pre-defined size and can only contain integer keys. As the size is limited this saves memory and the indexing is done more efficient. It does away with all hashing related issues which saves time. Now the question is: how much time does it save me?

Hey: we're running PHP, not Java.. I didn't have to bother with memory usage, why would I do that now all of a sudden?

You don't have to if you don't like to, but there may be a lot to gain for you. Especially when you are using a lot of arrays of which you know the size beforehand as well as the positions of items. In an environment which often is under heavy load the benefits of SplFixedArray may come in handy for you. (And updating software is cheaper than updating hardware..)

Numbers

As I said I was fooled at first by two comments on the PHP.net manual: this one and this one.

The first one tests the speed of insertions in the regular array and the speed of insertions in the SplFixedArray and returns all positives for the SplFixedArray.

The latter one claims to be more realistic, but results in a fatal error directly because the author is trying to insert items on positions outside the range of the SplFixedArray (index out of bounds exception, also common in Java). If this is a realistic example I would reconsider using PHP ;-)

I've compiled a simple script to test the speeds and memory sizes of array vs SplFixedArray. To (kind of) prevent small background processes influencing the results these tests are done multiple times and averaged. The results are shown below. Please note that these results may vary every time you execute the script. However: the larger the size of the array, the less variance occurs and the more reliable the numbers are.

script:

<?php

$maxSize = (int) $_GET['size'];
$times = (int) $_GET['times'];

set_time_limit(0);

echo "<h2>Number of repeated tests: " . $times . "</h2>";
echo "<table border='1'>";
echo "<thead><tr><th>Items</th><th>Time Array</th><th>Memory Array</th><th>SplFixedArray</th><th>Memory SplFixedArray</th><th>Array/SplFixedArray ratio</th><th>Speed increase by SplFixedArray</th><th>Memory reduction by SplFixedArray</th></tr></thead>";
echo "<tbody>";

for($size = 1000; $size < $maxSize; $size *= 2) {
	
	echo "<tr><td align='right'>" . $size . "</td>";
	
	$arrTotal = 0;
	$arrMemUsage = 0;
	for($time = 0; $time < $times; $time++){
		$mStart = memory_get_usage();
		$container1 = array();
		for($s = microtime(true), $i = 0; $i < $size; $i++) {
			$container1[$i] = 1;
		}
		
		$arrMemUsage += (memory_get_usage() - $mStart);
		$arrTime = (microtime(true) - $s);
		$arrTotal += $arrTime;
	}
	
	$avgArrMem = ($arrMemUsage / $times);
	$avgArr = ($arrTotal / $times);
	echo "<td align='right'>" . $avgArr  . "</td>";
	echo "<td align='right'>" . $avgArrMem . "</td>";
	// Cleanup to REDUCE the influence of memory blocks on the result
	unset($arrTotal);
	unset($arrMemUsage);
	unset($container1);
	
	$splFixedArrTotal = 0;
	$splFixedArrMemUsage = 0;
	for($time = 0; $time < $times; $time++){
		$mStart = memory_get_usage();
		$container2 = new SplFixedArray($size);
		for($s = microtime(true), $i = 0; $i < $size; $i++) {
			$container2[$i] = 1;
		}
		
		$splFixedArrMemUsage += (memory_get_usage() - $mStart);
		$splFixedArrTime = (microtime(true) - $s);
		$splFixedArrTotal += $splFixedArrTime;
	}
	
	$avgSplFixedArrMem = ($splFixedArrMemUsage / $times);
	$avgSplFixedArr = ($splFixedArrTotal / $times);
	echo "<td align='right'>" . $avgSplFixedArr . "</td>";
	echo "<td align='right'>" . $avgSplFixedArrMem . "</td>";
	// Cleanup to REDUCE the influence of memory blocks on the result
	unset($splFixedArrTotal);
	unset($splFixedArrMemUsage);
	unset($container2);
	
	// Calculate ratio
	echo "<td align='right'>" . ($avgArr / $avgSplFixedArr) . "</td>";
	// Calculate speed increase percentage
	echo "<td align='right'>" . number_format(((($avgSplFixedArr - $avgArr) / $avgArr) * -100), 4) . " %</td>";
	// Calculated memory reduction percentage
	if ($avgArrMem == 0){
		echo "<td align='right'>NaN</td></tr>";
	}
	else {
		echo "<td align='right'>" . number_format(((($avgSplFixedArrMem - $avgArrMem) / $avgArrMem) * -100), 4) . " %</td></tr>";
	}
	
	// Cleanup to REDUCE the influence of memory blocks on the result
	unset($avgArr);
	unset($avgSplFixedArr);

}

echo "</tbody></table>";

?>

results:

Results (regular table didn't fit the page)

So what does it say?

Well regarding the memory the usage has been decreased by around 58% in this case. This, of course, is due to the fact that the SplFixedArray has a limited size and therefor a (pre-defined) limited space in the memory reserved. There is some gain in speed as well.

So is it better to use?

That really depends. The SplFixedArray has some advantages, but also some drawbacks compared to the common array. It should be used where it fits: if you need an array of a size which can be pre-defined and where you need integer keys. It's also a good (at least, I think.. but I started with Java..) habit to use the appropriate data structures.

Be Sociable, Share!

2 comments

  1. var_dump(isAssoc(array(1 => ‘string’))); // Should return false, but returns true.

  2. Hmm I think I should rewrite this some time. This time with unit tests and all..

Leave a Reply

Your email address will not be published. Required fields are marked *


five + 7 =