<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Patrick&#039;s playground &#187; Python</title>
	<atom:link href="http://www.vankouteren.eu/blog/category/programming/programming-python/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.vankouteren.eu/blog</link>
	<description>Random thoughts, problems and solutions</description>
	<lastBuildDate>Sun, 29 Jan 2012 07:53:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Python: check for substring speed</title>
		<link>http://www.vankouteren.eu/blog/2009/06/python-check-for-substring-speed/</link>
		<comments>http://www.vankouteren.eu/blog/2009/06/python-check-for-substring-speed/#comments</comments>
		<pubDate>Mon, 29 Jun 2009 15:07:23 +0000</pubDate>
		<dc:creator>Patrick van Kouteren</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[comparison]]></category>
		<category><![CDATA[method speed]]></category>
		<category><![CDATA[substring]]></category>

		<guid isPermaLink="false">http://www.vankouteren.eu/blog/?p=110</guid>
		<description><![CDATA[I was looking for options on how to check if a certain substring (in my case ' FROM ') is present in a SQL query string when I found this blog entry. Just for fun I decided to have a look at how fast these checks would be compared to each other. I was dealing [...]]]></description>
			<content:encoded><![CDATA[<p>I was looking for options on how to check if a certain substring (in my case ' FROM ') is present in a SQL query string when I found <a title="Python check for substring" href="http://bka-bonn.de/wordpress/index.php/2008/12/26/python-trick-check-for-substring/" target="_blank">this</a> blog entry. Just for fun I decided to have a look at how fast these checks would be compared to each other.</p>
<p><span id="more-110"></span></p>
<p>I was dealing with a two queries, knowing:</p>
<pre class="sql"><span style="color: #993333; font-weight: bold;">SELECT</span> oid,typname,typlen,typlem,typdefault,typbasetype,typnotnull,typtype
<span style="color: #993333; font-weight: bold;">FROM</span> pg_type;</pre>
<p>And</p>
<pre class="sql"><span style="color: #993333; font-weight: bold;">SELECT</span> attname,attnum,atttypid,attndims,attnotnull,atthasdef,
pg_get_expr<span style="color: #66cc66;">&#40;</span>adbin,adrelid<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> adbin
<span style="color: #993333; font-weight: bold;">FROM</span> pg_attribute <span style="color: #993333; font-weight: bold;">LEFT</span> <span style="color: #993333; font-weight: bold;">JOIN</span> pg_attrdef <span style="color: #993333; font-weight: bold;">ON</span> attrelid = adrelid <span style="color: #993333; font-weight: bold;">AND</span> attnum = adnum
<span style="color: #993333; font-weight: bold;">WHERE</span> attisdropped = false <span style="color: #993333; font-weight: bold;">AND</span> attnum &amp;gt; <span style="color: #cc66cc;">0</span> <span style="color: #993333; font-weight: bold;">AND</span> attrelid <span style="color: #993333; font-weight: bold;">IN</span>
<span style="color: #66cc66;">&#40;</span> <span style="color: #993333; font-weight: bold;">SELECT</span> oid <span style="color: #993333; font-weight: bold;">FROM</span> pg_class <span style="color: #993333; font-weight: bold;">WHERE</span> relname=%s <span style="color: #993333; font-weight: bold;">AND</span> relkind=<span style="color: #ff0000;">'r'</span> <span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> attnum<span style="color: #66cc66;">&#41;</span>;</pre>
<p>The code is pretty simple:</p>
<pre class="python"><span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">time</span>
t1 = <span style="color: #dc143c;">time</span>.<span style="color: #dc143c;">time</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
      <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #483d8b;">' FROM '</span> <span style="color: #ff7700;font-weight:bold;">in</span> query:
          <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'IN found it!'</span>
      t2 = <span style="color: #dc143c;">time</span>.<span style="color: #dc143c;">time</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
      <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;Took me &quot;</span> + <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>t2-t1<span style="color: black;">&#41;</span> + <span style="color: #483d8b;">&quot; sec.&quot;</span>
&nbsp;
      t3 = <span style="color: #dc143c;">time</span>.<span style="color: #dc143c;">time</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
      <span style="color: #ff7700;font-weight:bold;">if</span> query.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">' FROM '</span><span style="color: black;">&#41;</span> != <span style="color: #ff4500;">-1</span>:
          <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'FIND found it!'</span>
      t4 = <span style="color: #dc143c;">time</span>.<span style="color: #dc143c;">time</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
      <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;Took me &quot;</span> + <span style="color: #008000;">str</span><span style="color: black;">&#40;</span>t4-t3<span style="color: black;">&#41;</span> + <span style="color: #483d8b;">&quot; sec.&quot;</span></pre>
<p>The results look as follows:</p>
<pre>IN found it!
Took me 4.72068786621e-05 sec.
FIND found it!
Took me 1.09672546387e-05 sec.
IN found it!
Took me 4.19616699219e-05 sec.
FIND found it!
Took me 1.12056732178e-05 sec.
IN found it!
Took me 3.48091125488e-05 sec.
FIND found it!
Took me 9.05990600586e-06 sec.
Took me 1.90734863281e-06 sec.
Took me 5.00679016113e-06 sec.
IN found it!
Took me 2.59876251221e-05 sec.
FIND found it!
Took me 1.19209289551e-05 sec.
Took me 9.53674316406e-07 sec.
Took me 1.90734863281e-06 sec.
IN found it!
Took me 0.00103211402893 sec.
FIND found it!
Took me 2.50339508057e-05 sec.
Took me 9.53674316406e-07 sec.
Took me 4.05311584473e-06 sec.</pre>
<p>As we can see: if we use the if-in test, we get only one result even if there are more instances of ' FROM ' in the string. When using the find method, all instances are retrieved. When having only one instance in your string, the find method is usually faster. When having multiple instances, the if-in test will be faster.<br />
It doesn't make much sense with small strings, but if you're just interested in finding a substring one or more times in a large string or a piece of text, it can make a difference.<br />
So far my little experiment. Knowing the answer, I can sleep well again tonight <img src='http://www.vankouteren.eu/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.vankouteren.eu/blog/2009/06/python-check-for-substring-speed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python ParserFactory</title>
		<link>http://www.vankouteren.eu/blog/2009/05/python-parserfactory/</link>
		<comments>http://www.vankouteren.eu/blog/2009/05/python-parserfactory/#comments</comments>
		<pubDate>Tue, 05 May 2009 05:13:32 +0000</pubDate>
		<dc:creator>Patrick van Kouteren</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Factory model]]></category>
		<category><![CDATA[ParserFactory]]></category>

		<guid isPermaLink="false">http://www.vankouteren.eu/blog/?p=62</guid>
		<description><![CDATA[Yesterday I had an idea about how to manage parsers in dynamic way. My idea was like this A managing class can access the parsers folder and check which parsers (files) are available A parser describes which file formats (and which versions of it) it can parse The result should be that parser classes are [...]]]></description>
			<content:encoded><![CDATA[<div>
<p>Yesterday I had an idea about how to manage parsers in dynamic way. My idea was like this</p>
<ul>
<li>A managing class can access the parsers folder and check which parsers (files) are available</li>
<li>A parser describes which file formats (and which versions of it) it can parse</li>
</ul>
</div>
<div>The result should be that parser classes are implementing a parser interface and a ParserFactory can get info from the available parsers. As I have been working with Java, PHP, Haskell and some other languages, but rather new at Python this is an interesting problem to get to know Python a little better.</div>
<div><span id="more-62"></span>With an inheritance structure (which is not fully supported by Python &lt; 2.6 I thought) we can ensure that every parser implementing a parser interface has functions implemented which the ParserFactory can call to obtain information about that particular parser.</div>
<div>Basically, what I would like to see is this:</div>
<table border="1">
<tbody>
<tr>
<td>In a script I have a file <em>f</em></td>
</tr>
<tr>
<td>I call <em>pf = ParserFactory()</em></td>
</tr>
<tr>
<td>Upon call this ParserFactory constructor, the ParserFactory class gathers information from all parsers available. Say they are in a folder <em>parsers</em>. The ParserFactory will then get the contents of the folder <em>parsers</em> and for each class file construct an object, from which it can request information.</p>
<p>So for example <em>psimi.py</em> will the ParserFactory give a PSIMIParser object. The ParserFactory can then call a method like <em>getSupportedFileFormats() </em>which returns the file formats which can be parsed by this PSIMIParser.</td>
</tr>
</tbody>
</table>
<div>It would really be great if such a thing would be possible so people can create their own parser by implementing a parser interface and do not have to worry about anything else as all is done by the ParserFactory.</div>
<div>I currently have this code:</div>
<p>Testfile:</p>
<pre class="python">pf = parsers.<span style="color: black;">parserfactory</span>.<span style="color: black;">ParserFactory</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
pf.<span style="color: black;">getAvailableParsers</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></pre>
<p>Then for a single parser I have a GenericParser interface which every parser should implement:</p>
<pre class="python"><span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
This GenericParser is an abstract superclass. It defines methods which subclasses should override. This guarantees us that
certain functions exist.
@author: Patrick van Kouteren
&nbsp;
@version: 0.1
&quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> GenericParser:
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
    self.fileformats will be a dictionary in the form 'extension : version '. E.g. 'xml : 1.0'
    &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">NotImplementedError</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;This is a GenericParser. Please define a list 'self.fileformat' here with extensions which your parser can parse!&quot;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> getSupportedFileFormats<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">NotImplementedError</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;This is a GenericParser. Please return the list 'self.fileformat' here which should be defined at self.init&quot;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> parse<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, filename<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">NotImplementedError</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;This is a GenericParser. Please implement a proper parse function&quot;</span><span style="color: black;">&#41;</span></pre>
<p>The crux is in this getSupportedFileFormats function. I would like to call this function on every parser file. This file contains a parser class, so basically I want to create an object from a file, but I don't know the object's name from the file.</p>
<p>Currently my ParserFactory looks as follows (note that I'm still working on it, so not all is finished yet!):</p>
<pre class="python"><span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
This ParserFactory contains knowledge about how to parse files. It can be fed a file and return the parsed data.
It uses the parsers in this parsers folder, but abstracts away various operations.
@author: Patrick van Kouteren
&nbsp;
@version: 0.1
&quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
<span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">os</span>, <span style="color: #dc143c;">types</span>, <span style="color: #dc143c;">sys</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> ParserFactory:
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
    Gather info about the contents of the database which are important for parsing
    &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #008000;">self</span>.<span style="color: black;">importedDatabases</span> = <span style="color: #008000;">self</span>.<span style="color: black;">checkImportedDatabases</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
    Check which databases (and which versions of them) are present in the database
    &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> checkImportedDatabases<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        databases = <span style="color: black;">&#123;</span><span style="color: black;">&#125;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> databases 
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
    Return a list of databases and their version which are present (imported) in IBIDAS
    &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> getImportedDatabases<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">return</span> <span style="color: #008000;">self</span>.<span style="color: black;">importedDatabases</span>
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
    Check the parser directory for files which import the GenericParser class. If a file does so, it is guaranteed
    that we can call certain methods to request properties
    &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> getAvailableParsers<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
       parserdir = <span style="color: #dc143c;">sys</span>.<span style="color: black;">path</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span> + <span style="color: #483d8b;">&quot;/parsers&quot;</span>
       classnames = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
       parserfiles = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
       <span style="color: #ff7700;font-weight:bold;">for</span> subdir, dirs, files <span style="color: #ff7700;font-weight:bold;">in</span> <span style="color: #dc143c;">os</span>.<span style="color: black;">walk</span><span style="color: black;">&#40;</span>parserdir<span style="color: black;">&#41;</span>:
           <span style="color: #ff7700;font-weight:bold;">for</span> <span style="color: #008000;">file</span> <span style="color: #ff7700;font-weight:bold;">in</span> files:
               <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #008000;">file</span>.<span style="color: black;">endswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;.py&quot;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #008000;">file</span>.<span style="color: black;">startswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;__&quot;</span><span style="color: black;">&#41;</span>:
                   parserfiles.<span style="color: black;">append</span><span style="color: black;">&#40;</span>parserdir + <span style="color: #483d8b;">&quot;/&quot;</span> + <span style="color: #008000;">file</span><span style="color: black;">&#41;</span>
       <span style="color: #ff7700;font-weight:bold;">for</span> parserfile <span style="color: #ff7700;font-weight:bold;">in</span> parserfiles:
           fileobject = <span style="color: #008000;">open</span><span style="color: black;">&#40;</span>parserfile<span style="color: black;">&#41;</span>
           content = fileobject.<span style="color: black;">read</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
           importfound = <span style="color: #ff4500;">0</span>
           <span style="color: #ff7700;font-weight:bold;">for</span> l <span style="color: #ff7700;font-weight:bold;">in</span> content.<span style="color: black;">splitlines</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
               <span style="color: #ff7700;font-weight:bold;">if</span> l.<span style="color: black;">startswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;import&quot;</span><span style="color: black;">&#41;</span>:
                   <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot; If this line contains genericparser, we know that this file is interesting &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
                   <span style="color: #ff7700;font-weight:bold;">if</span> l.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;genericparser&quot;</span><span style="color: black;">&#41;</span> &amp;gt; <span style="color: #ff4500;">0</span>:
                       importfound += <span style="color: #ff4500;">1</span>
               <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
               If we:
                * find a class definition
                * have found an import of generic parser
                * find a genericparser argument
               Then we know that this parser is a subclass of genericparser
               &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
               <span style="color: #ff7700;font-weight:bold;">if</span> l.<span style="color: black;">startswith</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;class&quot;</span><span style="color: black;">&#41;</span> <span style="color: #ff7700;font-weight:bold;">and</span> importfound &amp;gt; <span style="color: #ff4500;">0</span> <span style="color: #ff7700;font-weight:bold;">and</span> l.<span style="color: black;">find</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;genericparser&quot;</span><span style="color: black;">&#41;</span> &amp;gt; <span style="color: #ff4500;">0</span>:
                   <span style="color: #ff7700;font-weight:bold;">import</span> <span style="color: #dc143c;">re</span>
                   m = <span style="color: #dc143c;">re</span>.<span style="color: black;">split</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\W</span>+&quot;</span>, l<span style="color: black;">&#41;</span>
                   classname = m<span style="color: black;">&#91;</span>m.<span style="color: black;">index</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">&quot;class&quot;</span><span style="color: black;">&#41;</span> + <span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>
                   <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;file &quot;</span> + parserfile + <span style="color: #483d8b;">&quot; contains a callable parser class called &quot;</span> + classname
                   thisfilename = parserfile<span style="color: black;">&#91;</span>parserfile.<span style="color: black;">rfind</span><span style="color: black;">&#40;</span>u<span style="color: #483d8b;">&quot;/&quot;</span><span style="color: black;">&#41;</span><span style="color: #ff4500;">+1</span>:<span style="color: black;">&#93;</span>
                   ppath = <span style="color: #483d8b;">&quot;parsers.&quot;</span> + thisfilename<span style="color: black;">&#91;</span>:thisfilename.<span style="color: black;">rfind</span><span style="color: black;">&#40;</span>u<span style="color: #483d8b;">&quot;.&quot;</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span>+<span style="color: #483d8b;">&quot;.&quot;</span> + classname
                   <span style="color: #808080; font-style: italic;">#print &quot;the full import path will become &quot; + ppath</span>
                   classnames.<span style="color: black;">append</span><span style="color: black;">&#40;</span>ppath<span style="color: black;">&#41;</span>
       <span style="color: #ff7700;font-weight:bold;">return</span> classnames
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
    Return the supported file formats. The supported file formats are determined by checking all parsers which import
    the GenericParser. We can request the file formats they support and return this list.
    &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> getSupportedFileFormats<span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
        fileformats = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
        parserlist = <span style="color: #008000;">self</span>.<span style="color: black;">getAvailableParsers</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">for</span> <span style="color: #dc143c;">parser</span> <span style="color: #ff7700;font-weight:bold;">in</span> parserlist:
            p = <span style="color: #008000;">self</span>._get_func<span style="color: black;">&#40;</span><span style="color: #dc143c;">parser</span><span style="color: black;">&#41;</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">for</span> ff <span style="color: #ff7700;font-weight:bold;">in</span> p.<span style="color: black;">getSupportedFileFormats</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:
                fileformats.<span style="color: black;">append</span><span style="color: black;">&#40;</span>ff<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> fileformats
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
    Try to import a module. Then we can use this to get its class
    Source: http://code.activestate.com/recipes/223972/
    &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> _get_mod<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, modulePath<span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">try</span>:
            aMod = <span style="color: #dc143c;">sys</span>.<span style="color: black;">modules</span><span style="color: black;">&#91;</span>modulePath<span style="color: black;">&#93;</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #008000;">isinstance</span><span style="color: black;">&#40;</span>aMod, <span style="color: #dc143c;">types</span>.<span style="color: black;">ModuleType</span><span style="color: black;">&#41;</span>:
                <span style="color: #ff7700;font-weight:bold;">raise</span> <span style="color: #008000;">KeyError</span>
        <span style="color: #ff7700;font-weight:bold;">except</span> <span style="color: #008000;">KeyError</span>:
            <span style="color: #808080; font-style: italic;"># The last [''] is very important!</span>
            aMod = <span style="color: #008000;">__import__</span><span style="color: black;">&#40;</span>modulePath, <span style="color: #008000;">globals</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>, <span style="color: #008000;">locals</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>, <span style="color: black;">&#91;</span><span style="color: #483d8b;">''</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
            <span style="color: #dc143c;">sys</span>.<span style="color: black;">modules</span><span style="color: black;">&#91;</span>modulePath<span style="color: black;">&#93;</span> = aMod
        <span style="color: #ff7700;font-weight:bold;">return</span> aMod
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
    Return the class from 'parsers.file.class'
    Source: http://code.activestate.com/recipes/223972/
    &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> _get_func<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>,fullFuncName<span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;Retrieve a function object from a full dotted-package name.&quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
        <span style="color: #808080; font-style: italic;"># Parse out the path, module, and function</span>
        lastDot = fullFuncName.<span style="color: black;">rfind</span><span style="color: black;">&#40;</span>u<span style="color: #483d8b;">&quot;.&quot;</span><span style="color: black;">&#41;</span>
        funcName = fullFuncName<span style="color: black;">&#91;</span>lastDot + <span style="color: #ff4500;">1</span>:<span style="color: black;">&#93;</span>
        modPath = fullFuncName<span style="color: black;">&#91;</span>:lastDot<span style="color: black;">&#93;</span>
&nbsp;
        aMod = <span style="color: #008000;">self</span>._get_mod<span style="color: black;">&#40;</span>modPath<span style="color: black;">&#41;</span>
        aFunc = <span style="color: #008000;">getattr</span><span style="color: black;">&#40;</span>aMod, funcName<span style="color: black;">&#41;</span>
&nbsp;
        <span style="color: #808080; font-style: italic;"># Assert that the function is a *callable* attribute.</span>
        <span style="color: #ff7700;font-weight:bold;">assert</span> <span style="color: #008000;">callable</span><span style="color: black;">&#40;</span>aFunc<span style="color: black;">&#41;</span>, u<span style="color: #483d8b;">&quot;%s is not callable.&quot;</span> % fullFuncName
&nbsp;
        <span style="color: #808080; font-style: italic;"># Return a reference to the function itself,</span>
        <span style="color: #808080; font-style: italic;"># not the results of the function.</span>
        <span style="color: #ff7700;font-weight:bold;">return</span> aFunc
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
    Returns the class where a method is defined
&nbsp;
    def find_defining_class(self, obj, meth_name):
        for ty in type(obj).mro():
            if meth_name in ty.__dict__:
                return ty
    &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
    Check if all databases needed to import a file are present
    &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> checkPrerequisites<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, file_prerequisites<span style="color: black;">&#41;</span>:
        errors = <span style="color: black;">&#91;</span><span style="color: black;">&#93;</span>
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
    Parse a list of files. This means that we not only have to check the prerequisites, but also an order in which to
    parse the files as a file can be a prerequisite for another file
    &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> parseList<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, filelist, filelist_prerequisites<span style="color: black;">&#41;</span>:
        order = <span style="color: #008000;">self</span>.<span style="color: black;">findParseOrder</span><span style="color: black;">&#40;</span><span style="color: #008000;">file</span><span style="color: black;">&#41;</span>
&nbsp;
    <span style="color: #ff7700;font-weight:bold;">def</span> parse<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">file</span>, file_prerequisites, <span style="color: #dc143c;">parser</span>=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:
        <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot; First check if all prerequisites are present &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
        errors = <span style="color: #008000;">self</span>.<span style="color: black;">checkPrerequisites</span><span style="color: black;">&#40;</span>file_prerequisites<span style="color: black;">&#41;</span>
        <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> empty<span style="color: black;">&#40;</span>errors<span style="color: black;">&#41;</span>:
            data = <span style="color: #483d8b;">&quot;<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span>.<span style="color: black;">join</span><span style="color: black;">&#40;</span>errors<span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">raise</span> prerequisitesError, data
        <span style="color: #ff7700;font-weight:bold;">else</span>:
            <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #dc143c;">parser</span>:
                <span style="color: #dc143c;">parser</span> = <span style="color: #008000;">self</span>.<span style="color: black;">findParser</span><span style="color: black;">&#40;</span><span style="color: #008000;">file</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">else</span>:
                <span style="color: #ff7700;font-weight:bold;">pass</span>
            <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #dc143c;">parser</span>:
                <span style="color: #008000;">self</span>.<span style="color: black;">doParsing</span><span style="color: black;">&#40;</span><span style="color: #008000;">file</span>, <span style="color: #dc143c;">parser</span><span style="color: black;">&#41;</span>
            <span style="color: #ff7700;font-weight:bold;">else</span>:
                <span style="color: #ff7700;font-weight:bold;">raise</span> parserError
&nbsp;
    <span style="color: #483d8b;">&quot;&quot;</span><span style="color: #483d8b;">&quot;
    Based on several things a parser is tried to be found.
    1. The file extension: certain file extensions belong to specific formats
    2. The first line:
    &quot;</span><span style="color: #483d8b;">&quot;&quot;</span>
    <span style="color: #ff7700;font-weight:bold;">def</span> findParser<span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #008000;">file</span><span style="color: black;">&#41;</span>:
        <span style="color: #ff7700;font-weight:bold;">pass</span></pre>
<p>Any thoughts, comments and discussions are appreciated. For more information: Chris Leary has posted an improvement <a href="http://blog.cdleary.com/2009/06/registry-pattern-trumps-import-magic/" target="_blank">here</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.vankouteren.eu/blog/2009/05/python-parserfactory/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

