<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Patrick&#039;s playground &#187; comparison</title>
	<atom:link href="http://www.vankouteren.eu/blog/tag/comparison/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.vankouteren.eu/blog</link>
	<description>Random thoughts, problems and solutions</description>
	<lastBuildDate>Sun, 29 Jan 2012 07:53:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>PHP SPL data structure: SplFixedArray</title>
		<link>http://www.vankouteren.eu/blog/2011/09/php-spl-data-structure-splfixedarray/</link>
		<comments>http://www.vankouteren.eu/blog/2011/09/php-spl-data-structure-splfixedarray/#comments</comments>
		<pubDate>Thu, 29 Sep 2011 14:51:31 +0000</pubDate>
		<dc:creator>Patrick van Kouteren</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[array]]></category>
		<category><![CDATA[comparison]]></category>
		<category><![CDATA[memory usage]]></category>
		<category><![CDATA[speed]]></category>
		<category><![CDATA[SplFixedArray]]></category>

		<guid isPermaLink="false">http://www.vankouteren.eu/blog/?p=267</guid>
		<description><![CDATA[PHP 5.3 introduced some new data structures. The talk of Jurriën Stutterheim on PFCongres 2011 on SPL structures and their performance triggered me to have a closer look at the performance of these structures. I was kind of fooled by two comments on the PHP.net page, so it was time to find out myself. For [...]]]></description>
			<content:encoded><![CDATA[            <script type="text/javascript" src="http://www.vankouteren.eu/blog/wp-content/plugins/wordpress-code-snippet/scripts/shBrushPhp.js"></script>
<p>PHP 5.3 introduced some new data structures. The talk of Jurriën Stutterheim on<a title="PFCongres" href="http://www.pfcongres.nl/"> PFCongres 2011</a> on SPL structures and their performance triggered me to have a closer look at the performance of these structures. I was kind of fooled by two comments on the <a title="SplFixedArray PHP.net page" href="http://www.php.net/manual/en/class.splfixedarray.php">PHP.net page</a>, so it was time to find out myself.</p>
<p><span id="more-267"></span>For people familiar with Java the SplFixedArray is not a strange structure as they are common structures in this language. PHP only used arrays in the past.</p>
<p>The common array structure may contain all kind of keys (one can use strings, integers and even combine them) and under the hood PHP uses a hashing algorithm to create a unique array index for these keys. So actually under the hood these arrays are comparable to Java HashMaps. Like a database can create indexes on keys, PHP creates an index on the hashed indexes to speed up item retrieval from the array. However: hashing does not guarantee a unique output for every unique input. It may well be that two completely different keys result in a same hashed value. A hashing algorithm can sort this out in its own way (there are various way to do this, but I think this is out of the scope of this post right now), so there's another level of complexity here.</p>
<p>The new SplFixedArray has a pre-defined size and can only contain integer keys. As the size is limited this saves memory and the indexing is done more efficient. It does away with all hashing related issues which saves time. Now the question is: how much time does it save me?</p>
<p><strong>Hey: we're running PHP, not Java.. I didn't have to bother with memory usage, why would I do that now all of a sudden?</strong></p>
<p>You don't have to if you don't like to, but there may be a lot to gain for you. Especially when you are using a lot of arrays of which you know the size beforehand as well as the positions of items. In an environment which often is under heavy load the benefits of SplFixedArray may come in handy for you. (And updating software is cheaper than updating hardware..)</p>
<p><strong>Numbers</strong></p>
<p>As I said I was fooled at first by two comments on the <a title="PHP.net SplFixedArray" href="http://www.php.net/manual/en/class.splfixedarray.php">PHP.net manual</a>: <a title="SplFixedArray test 1" href="http://www.php.net/manual/en/class.splfixedarray.php#92214">this one</a> and <a title="SplFixedArray test 2" href="http://www.php.net/manual/en/class.splfixedarray.php#94179">this one</a>.</p>
<p>The first one tests the speed of insertions in the regular array and the speed of insertions in the SplFixedArray and returns all positives for the SplFixedArray.</p>
<p>The latter one claims to be more realistic, but results in a fatal error directly because the author is trying to insert items on positions outside the range of the SplFixedArray (index out of bounds exception, also common in Java). If this is a realistic example I would reconsider using PHP <img src='http://www.vankouteren.eu/blog/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>I've compiled a simple script to test the speeds and memory sizes of array vs SplFixedArray. To (kind of) prevent small background processes influencing the results these tests are done multiple times and averaged. The results are shown below. Please note that these results may vary every time you execute the script. However: the larger the size of the array, the less variance occurs and the more reliable the numbers are.</p>
<p><em>script:</em></p>
<p><pre class="brush: php">&lt;?php

$maxSize = (int) $_GET['size'];
$times = (int) $_GET['times'];

set_time_limit(0);

echo &quot;&lt;h2&gt;Number of repeated tests: &quot; . $times . &quot;&lt;/h2&gt;&quot;;
echo &quot;&lt;table border='1'&gt;&quot;;
echo &quot;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Items&lt;/th&gt;&lt;th&gt;Time Array&lt;/th&gt;&lt;th&gt;Memory Array&lt;/th&gt;&lt;th&gt;SplFixedArray&lt;/th&gt;&lt;th&gt;Memory SplFixedArray&lt;/th&gt;&lt;th&gt;Array/SplFixedArray ratio&lt;/th&gt;&lt;th&gt;Speed increase by SplFixedArray&lt;/th&gt;&lt;th&gt;Memory reduction by SplFixedArray&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&quot;;
echo &quot;&lt;tbody&gt;&quot;;

for($size = 1000; $size &lt; $maxSize; $size *= 2) {
	
	echo &quot;&lt;tr&gt;&lt;td align='right'&gt;&quot; . $size . &quot;&lt;/td&gt;&quot;;
	
	$arrTotal = 0;
	$arrMemUsage = 0;
	for($time = 0; $time &lt; $times; $time++){
		$mStart = memory_get_usage();
		$container1 = array();
		for($s = microtime(true), $i = 0; $i &lt; $size; $i++) {
			$container1[$i] = 1;
		}
		
		$arrMemUsage += (memory_get_usage() - $mStart);
		$arrTime = (microtime(true) - $s);
		$arrTotal += $arrTime;
	}
	
	$avgArrMem = ($arrMemUsage / $times);
	$avgArr = ($arrTotal / $times);
	echo &quot;&lt;td align='right'&gt;&quot; . $avgArr  . &quot;&lt;/td&gt;&quot;;
	echo &quot;&lt;td align='right'&gt;&quot; . $avgArrMem . &quot;&lt;/td&gt;&quot;;
	// Cleanup to REDUCE the influence of memory blocks on the result
	unset($arrTotal);
	unset($arrMemUsage);
	unset($container1);
	
	$splFixedArrTotal = 0;
	$splFixedArrMemUsage = 0;
	for($time = 0; $time &lt; $times; $time++){
		$mStart = memory_get_usage();
		$container2 = new SplFixedArray($size);
		for($s = microtime(true), $i = 0; $i &lt; $size; $i++) {
			$container2[$i] = 1;
		}
		
		$splFixedArrMemUsage += (memory_get_usage() - $mStart);
		$splFixedArrTime = (microtime(true) - $s);
		$splFixedArrTotal += $splFixedArrTime;
	}
	
	$avgSplFixedArrMem = ($splFixedArrMemUsage / $times);
	$avgSplFixedArr = ($splFixedArrTotal / $times);
	echo &quot;&lt;td align='right'&gt;&quot; . $avgSplFixedArr . &quot;&lt;/td&gt;&quot;;
	echo &quot;&lt;td align='right'&gt;&quot; . $avgSplFixedArrMem . &quot;&lt;/td&gt;&quot;;
	// Cleanup to REDUCE the influence of memory blocks on the result
	unset($splFixedArrTotal);
	unset($splFixedArrMemUsage);
	unset($container2);
	
	// Calculate ratio
	echo &quot;&lt;td align='right'&gt;&quot; . ($avgArr / $avgSplFixedArr) . &quot;&lt;/td&gt;&quot;;
	// Calculate speed increase percentage
	echo &quot;&lt;td align='right'&gt;&quot; . number_format(((($avgSplFixedArr - $avgArr) / $avgArr) * -100), 4) . &quot; %&lt;/td&gt;&quot;;
	// Calculated memory reduction percentage
	if ($avgArrMem == 0){
		echo &quot;&lt;td align='right'&gt;NaN&lt;/td&gt;&lt;/tr&gt;&quot;;
	}
	else {
		echo &quot;&lt;td align='right'&gt;&quot; . number_format(((($avgSplFixedArrMem - $avgArrMem) / $avgArrMem) * -100), 4) . &quot; %&lt;/td&gt;&lt;/tr&gt;&quot;;
	}
	
	// Cleanup to REDUCE the influence of memory blocks on the result
	unset($avgArr);
	unset($avgSplFixedArr);

}

echo &quot;&lt;/tbody&gt;&lt;/table&gt;&quot;;

?&gt;</pre></p>
<p><em>results:</em></p>
<p><a href="http://www.vankouteren.eu/blog/wp-content/uploads/2011/09/arrayvssplfixedarrayresult.jpg">Results (regular table didn't fit the page)</a></p>
<p><strong>So what does it say?</strong></p>
<p><strong></strong>Well regarding the memory the usage has been decreased by around 58% in this case. This, of course, is due to the fact that the SplFixedArray has a limited size and therefor a (pre-defined) limited space in the memory reserved. There is some gain in speed as well.</p>
<p><strong>So is it better to use?</strong></p>
<p>That really depends. The SplFixedArray has some advantages, but also some drawbacks compared to the common array. It should be used where it fits: if you need an array of a size which can be pre-defined and where you need integer keys. It's also a good (at least, I think.. but I started with Java..) habit to use the appropriate data structures.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.vankouteren.eu/blog/2011/09/php-spl-data-structure-splfixedarray/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python: check for substring speed</title>
		<link>http://www.vankouteren.eu/blog/2009/06/python-check-for-substring-speed/</link>
		<comments>http://www.vankouteren.eu/blog/2009/06/python-check-for-substring-speed/#comments</comments>
		<pubDate>Mon, 29 Jun 2009 15:07:23 +0000</pubDate>
		<dc:creator>Patrick van Kouteren</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[comparison]]></category>
		<category><![CDATA[method speed]]></category>
		<category><![CDATA[substring]]></category>

		<guid isPermaLink="false">http://www.vankouteren.eu/blog/?p=110</guid>
		<description><![CDATA[I was looking for options on how to check if a certain substring (in my case ' FROM ') is present in a SQL query string when I found this blog entry. Just for fun I decided to have a look at how fast these checks would be compared to each other. I was dealing [...]]]></description>
			<content:encoded><![CDATA[            <script type="text/javascript" src="http://www.vankouteren.eu/blog/wp-content/plugins/wordpress-code-snippet/scripts/shBrushPhp.js"></script>
<p>I was looking for options on how to check if a certain substring (in my case ' FROM ') is present in a SQL query string when I found <a title="Python check for substring" href="http://bka-bonn.de/wordpress/index.php/2008/12/26/python-trick-check-for-substring/" target="_blank">this</a> blog entry. Just for fun I decided to have a look at how fast these checks would be compared to each other.</p>
<p><span id="more-110"></span></p>
<p>I was dealing with a two queries, knowing:</p>
<pre class="sql"><span style="color: #993333; font-weight: bold;">SELECT</span> oid,typname,typlen,typlem,typdefault,typbasetype,typnotnull,typtype
<span style="color: #993333; font-weight: bold;">FROM</span> pg_type;</pre>
<p>And</p>
<pre class="sql"><span style="color: #993333; font-weight: bold;">SELECT</span> attname,attnum,atttypid,attndims,attnotnull,atthasdef,
pg_get_expr<span style="color: #66cc66;">&#40;</span>adbin,adrelid<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> adbin
<span style="color: #993333; font-weight: bold;">FROM</span> pg_attribute <span style="color: #993333; font-weight: bold;">LEFT</span> <span style="color: #993333; font-weight: bold;">JOIN</span> pg_attrdef <span style="color: #993333; font-weight: bold;">ON</span> attrelid = adrelid <span style="color: #993333; font-weight: bold;">AND</span> attnum = adnum
<span style="color: #993333; font-weight: bold;">WHERE</span> attisdropped = false <span style="color: #993333; font-weight: bold;">AND</span> attnum &amp;gt; <span style="color: #cc66cc;">0</span> <span style="color: #993333; font-weight: bold;">AND</span> attrelid <span style="color: #993333; font-weight: bold;">IN</span>
<span style="color: #66cc66;">&#40;</span> <span style="color: #993333; font-weight: bold;">SELECT</span> oid <span style="color: #993333; font-weight: bold;">FROM</span> pg_class <span style="color: #993333; font-weight: bold;">WHERE</span> relname=%s <span style="color: #993333; font-weight: bold;">AND</span> relkind=<span style="color: #ff0000;">'r'</span> <span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> attnum<span style="color: #66cc66;">&#41;</span>;</pre>
<p>The code is pretty simple:</p>
<pre class="python"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">time</span>
t1 = <span style="color: #dc143c;">time</span>.<span style="color: #dc143c;">time</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
      <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #483d8b;">' FROM '</span> <span style="color: #ff7700;font-weight:bold;">in</span> query:
          <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'IN found it!'</span>
      t2 = <span style="color: #dc143c;">time</span>.<span style="color: #dc143c;">time</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
      <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;Took me &quot;</span> + <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>t2-t1<span style="color: black;">&#41;</span> + <span style="color: #483d8b;">&quot; sec.&quot;</span>
&nbsp;
      t3 = <span style="color: #dc143c;">time</span>.<span style="color: #dc143c;">time</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
      <span style="color: #ff7700;font-weight:bold;">if</span> query.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">' FROM '</span><span style="color: black;">&#41;</span> != <span style="color: #ff4500;">-1</span>:
          <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'FIND found it!'</span>
      t4 = <span style="color: #dc143c;">time</span>.<span style="color: #dc143c;">time</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
      <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;Took me &quot;</span> + <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>t4-t3<span style="color: black;">&#41;</span> + <span style="color: #483d8b;">&quot; sec.&quot;</span></pre>
<p>The results look as follows:</p>
<pre>IN found it!
Took me 4.72068786621e-05 sec.
FIND found it!
Took me 1.09672546387e-05 sec.
IN found it!
Took me 4.19616699219e-05 sec.
FIND found it!
Took me 1.12056732178e-05 sec.
IN found it!
Took me 3.48091125488e-05 sec.
FIND found it!
Took me 9.05990600586e-06 sec.
Took me 1.90734863281e-06 sec.
Took me 5.00679016113e-06 sec.
IN found it!
Took me 2.59876251221e-05 sec.
FIND found it!
Took me 1.19209289551e-05 sec.
Took me 9.53674316406e-07 sec.
Took me 1.90734863281e-06 sec.
IN found it!
Took me 0.00103211402893 sec.
FIND found it!
Took me 2.50339508057e-05 sec.
Took me 9.53674316406e-07 sec.
Took me 4.05311584473e-06 sec.</pre>
<p>As we can see: if we use the if-in test, we get only one result even if there are more instances of ' FROM ' in the string. When using the find method, all instances are retrieved. When having only one instance in your string, the find method is usually faster. When having multiple instances, the if-in test will be faster.<br />
It doesn't make much sense with small strings, but if you're just interested in finding a substring one or more times in a large string or a piece of text, it can make a difference.<br />
So far my little experiment. Knowing the answer, I can sleep well again tonight <img src='http://www.vankouteren.eu/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.vankouteren.eu/blog/2009/06/python-check-for-substring-speed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

