<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Pie Palace &#187; PHP</title>
	<atom:link href="http://www.piepalace.ca/blog/tag/php/feed" rel="self" type="application/rss+xml" />
	<link>http://www.piepalace.ca/blog</link>
	<description></description>
	<lastBuildDate>Mon, 26 Jul 2010 03:39:49 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>My blog likes your library</title>
		<link>http://www.piepalace.ca/blog/2009/12/my-blog-likes-your-library.html</link>
		<comments>http://www.piepalace.ca/blog/2009/12/my-blog-likes-your-library.html#comments</comments>
		<pubDate>Tue, 29 Dec 2009 06:47:12 +0000</pubDate>
		<dc:creator>Erigami Scholey-Fuller</dc:creator>
				<category><![CDATA[Ottawa]]></category>
		<category><![CDATA[Self Absorbtion]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[BiblioCommons]]></category>
		<category><![CDATA[BiblioPress]]></category>
		<category><![CDATA[Ottawa Public Library]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://www.piepalace.ca/blog/?p=1234</guid>
		<description><![CDATA[	
	BiblioPress publishes reviews from a Bibliocommons-based library catalogue to a WordPress-based blog. In other words: all the time I wasted reviewing stuff on Ottawa&#8217;s library website is now made useful because my blog will automatically republish my reviews. 
	The plugin is something verging on beta software. It works, but its only had limited testing.

]]></description>
			<content:encoded><![CDATA[	<p><center><img src="http://www.piepalace.ca/blog/wp-content/uploads/2009/12/are_go.png" alt="BiblioPress are go!" title="BiblioPress are go!" width="465" height="348" class="aligncenter size-full wp-image-1235" align="middle"/></center></p>
	<p><a href="http://wordpress.org/extend/plugins/bibliopress/">BiblioPress</a> publishes reviews from a <a href="http://www.bibliocommons.com/">Bibliocommons</a>-based library catalogue to a WordPress-based blog. In other words: all the time I <strike>wasted</strike> reviewing stuff on <a href="http://www.biblioottawalibrary.ca/">Ottawa&#8217;s library</a> website is now made <i>useful</i> because my blog will automatically republish my reviews. </p>
	<p>The plugin is something verging on beta software. It works, but its only had limited testing.
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.piepalace.ca/blog/2009/12/my-blog-likes-your-library.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Blogawa facelift coming</title>
		<link>http://www.piepalace.ca/blog/2009/02/blogawa-facelift-coming.html</link>
		<comments>http://www.piepalace.ca/blog/2009/02/blogawa-facelift-coming.html#comments</comments>
		<pubDate>Wed, 04 Feb 2009 02:42:57 +0000</pubDate>
		<dc:creator>Erigami Scholey-Fuller</dc:creator>
				<category><![CDATA[Blogawa.ca]]></category>
		<category><![CDATA[Ottawa]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[blogawa]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://www.piepalace.ca/blog/?p=981</guid>
		<description><![CDATA[	As alluded to last week, Blogawa is undergoing a reskinning. The new site will be pretty much the same in terms of functionality, except that it should look a bit nicer. The only major change will be the use of Gravatars to provide avatars for authors. 
	The URL for the RSS feed will change. I [...]]]></description>
			<content:encoded><![CDATA[	<p>As alluded to last week, Blogawa is undergoing a reskinning. The new site will be pretty much the same in terms of functionality, except that it should look a bit nicer. The only major change will be the use of <a href="http://en.gravatar.com/">Gravatar</a>s to provide avatars for authors. </p>
	<p><b>The URL for the RSS feed will change.</b> I should be able to set up a forward to send your RSS reader to the appropriate place, but I may not. So if you find that your feed reader breaks, come back to <a href="http://blogawa.ca">blogawa</a> and resubscribe.</p>
	<p>Aaaany day now&#8230;
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.piepalace.ca/blog/2009/02/blogawa-facelift-coming.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Things that make me sad (aggregator edition)</title>
		<link>http://www.piepalace.ca/blog/2008/07/things-that-make-me-sad-aggregator-edition.html</link>
		<comments>http://www.piepalace.ca/blog/2008/07/things-that-make-me-sad-aggregator-edition.html#comments</comments>
		<pubDate>Fri, 18 Jul 2008 17:14:33 +0000</pubDate>
		<dc:creator>Erigami Scholey-Fuller</dc:creator>
				<category><![CDATA[Bad]]></category>
		<category><![CDATA[Blogawa.ca]]></category>
		<category><![CDATA[Ottawa]]></category>
		<category><![CDATA[Self Absorbtion]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[aggregation]]></category>
		<category><![CDATA[microformat]]></category>
		<category><![CDATA[opensource]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[software]]></category>

		<guid isPermaLink="false">http://www.piepalace.ca/blog/2008/07/things-that-make-me-sad-aggregator-edition.html</guid>
		<description><![CDATA[	I&#8217;d like to switch blogawa.ca to use more standard aggregation software (a) so that I don&#8217;t have to maintain the codebase, and (b) so that I can add microformat parsing to the aggregator so that other planet sites will be able to detect microformatted postings. 
	There only seem to be two popular planet implementations: Planet [...]]]></description>
			<content:encoded><![CDATA[	<p>I&#8217;d like to switch <a href="http://blogawa.ca">blogawa.ca</a> to use more standard aggregation software (a) so that I don&#8217;t have to maintain the codebase, and (b) so that I can add microformat parsing to the aggregator so that other planet sites will be able to detect microformatted postings. </p>
	<p>There only seem to be two popular planet implementations: <a href="http://www.planetplanet.org/">Planet Planet</a> which is written in python, features 9,503 <abbr title="lines of code">loc</abbr> and output generated by a templating engine; the other implementation is <a href="http://svn.bitflux.ch/repos/public/planet-php/trunk/">planet-php</a> which is written in PHP, with 608 loc (plus 1202 lines of XSL, ugh), and features output generated by XSL. </p>
	<p>Given <a href="http://massassi.com/php/articles/template_engines/">my aversion to templating engines</a>, my dislike of XSL, I seem to be stuck. I either bite a bullet, or I keep up the opensource tradition of forking, splitting, and generally reinventing the wheel. =(
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.piepalace.ca/blog/2008/07/things-that-make-me-sad-aggregator-edition.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>HTML Tag of the Day</title>
		<link>http://www.piepalace.ca/blog/2008/06/html-tag-of-the-day.html</link>
		<comments>http://www.piepalace.ca/blog/2008/06/html-tag-of-the-day.html#comments</comments>
		<pubDate>Wed, 04 Jun 2008 17:57:26 +0000</pubDate>
		<dc:creator>Erigami Scholey-Fuller</dc:creator>
				<category><![CDATA[Good]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://www.piepalace.ca/blog/2008/06/html-tag-of-the-day.html</guid>
		<description><![CDATA[Did you know that there&#8217;s a &#60;q&#62; HTML tag? Neither did I. It doesn&#8217;t act as a link (natch), but it&#8217;s nice to know that it&#8217;s there. 

Carry on with your activities. I just thought my loyal readers would like to know.]]></description>
			<content:encoded><![CDATA[Did you know that there&#8217;s a <a href="http://www.w3.org/TR/html4/struct/text.html#edef-Q">&lt;q&gt; HTML tag</a>? Neither did I. It doesn&#8217;t act as a link (natch), but it&#8217;s nice to know that it&#8217;s there. 

Carry on with your activities. I just thought my loyal readers would like to know.]]></content:encoded>
			<wfw:commentRss>http://www.piepalace.ca/blog/2008/06/html-tag-of-the-day.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Steal this idea: Portable scraping</title>
		<link>http://www.piepalace.ca/blog/2008/05/steal-this-idea-portable-scraping.html</link>
		<comments>http://www.piepalace.ca/blog/2008/05/steal-this-idea-portable-scraping.html#comments</comments>
		<pubDate>Mon, 19 May 2008 22:47:16 +0000</pubDate>
		<dc:creator>Erigami Scholey-Fuller</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[Steal This Idea]]></category>
		<category><![CDATA[trusted computing]]></category>
		<category><![CDATA[virtual machine]]></category>

		<guid isPermaLink="false">http://www.piepalace.ca/blog/2008/05/steal-this-idea-portable-scraping.html</guid>
		<description><![CDATA[	The web is full of data. Many, many websites are thin scripts pulling data out of a database, formatting it in HTML, and presenting it in a way that is (hopefully) human readable. In an ideal world these websites would provide an API that would give programmers a way of sucking that information into their [...]]]></description>
			<content:encoded><![CDATA[	<p>The web is full of data. Many, many websites are thin scripts pulling data out of a database, formatting it in HTML, and presenting it in a way that is (hopefully) human readable. In an ideal world these websites would provide an API that would give programmers a way of sucking that information into their own creations. Sadly, banks, libraries, online stores, and other data providers aren&#8217;t that altruistic. They are only going to provide APIs when their customers demand it. </p>
	<p>Now imagine that we could impose an API onto websites. We could say &#8220;every bank will provide this API to give access to your financial history.&#8221; Obviously banks aren&#8217;t going to posse up and agree on a common interface, so we programmers will have to do it ourselves. Enter the idea of a &#8220;portable scraper.&#8221;</p>
	<p>A portable scraper is a script that will consume one or more web pages, parse them, normalize the data, and then present the transformed in a well defined schema. That sounds like a scraper, right? Where does the whole portable thing come in?</p>
	<p>Portability would allow programmers to write a scraper once and share it with other coders. The scraper would be</p>
	<ol>
	<li>Well defined, meaning that the inputs (ie, credentials and parameters) are easy to read, and the output is in a format easily consumable by other programs</li>
	<li>Language neutral, meaning that it would run in any host interpreter (eg: Python, Perl, PHP, Java, Ruby, JavaScript, etc)</li>
	<li>Auditable, meaning that programmers can read a scraper quickly and get a good idea of what it&#8217;s doing</li>
	<li>Secure, meaning that programmers have a reasonable assurance that a scraper isn&#8217;t leaking sensitive data to unauthorized third parties</li>
	</ol>
	<p>I&#8217;m essentially suggesting a domain specific language that runs in a virtual machine. The language defines how to get information out of some website, while the virtual machine limits who the scraper can talk to. </p>
	<p>Let&#8217;s look at an example. In some alternate universe, I have an account with the PiePalace National Bank (PPNB). Every week, I want to receive an SMS message telling me if I&#8217;m spending beyond my budget. Since the PPNB doesn&#8217;t provide an API for their customers (shame!), I have to use a portable scraper to pull my account history from the PPNB website. Happily, some other programmer has already faced this problem and has published a portable scraper to do the job. It looks like:</p>
	<pre class="code">
<span class='comment'>// Tell our caller which website we need to be able to access.</span>
<span class='comment'>// The scraper won't be able to access pages outside of this hierarchy</span>
require access 'https://natbank.pp/login.php'; <span class='comment'>// Login page</span>
require access 'https://accounts.natbank.pp/*'; <span class='comment'>// Portion of the website providing acct history</span>
	
<span class=\"comment\">// Tell the caller that we need certain parameters</span>
require input 'bankCardNumber'; <span class='comment'>// The user's debit card number</span>
require input 'password'; <span class=\"comment\">// The user's password</span>
require input 'accountNumber'; <span class='comment'>// The number of the account, as shown on PPNB web pages</span>
	
<span class='comment'>// Provide an interface for the return value. Quasi-BNF.</span>
export output {
  HISTORY = row*; <span class='comment'>// We must provide a history element that has zero or more rows</span>
  row = String[4]; <span class='comment'>// Indicate that each row has exactly four string elements</span>
};
	
<span class='comment'>// Start at the login page to get our cookies</span>
$browser = new Browser();
$browser get 'https://natbank.pp/login.php';
$form = $browser chooseform 'LoginForm';
$form set 'username' $INPUT{'bankCardNumber'};
$form set 'password' $INPUT{'password'};
$browser submit $form;
	
<span class='comment'>// Follow a link to our account history</span>
$browser follow ($browser chooseLink $INPUT{'accountNumber'});
	
<span class='comment'>// Consume the account history</span>
$toReturn = new Array();
	
while (true) {
  $table = $browser chooseTable '#acctHistory';
	
  $table runOnEachRow [ $row |
    <span class='comment'>// Parse each row in the table containing our account history</span>
    $date = $row get 0;
    $amount = $row get 1;
    $who = $row get 2;
    $balanceAfter = $row get 3;
	
    $toReturn push new Array($date, $amount, $who, $balanceAfter);
  ];
	
  <span class='comment'>// Follow the link to the next page, (if it exists)</span>
  $nextPageLink = $browser chooseLink 'Next Page';
  if ($nextPageLink == nil) {
    break;
  }
	
  <span class='comment'>// Read the entire history</span>
  $browser follow $nextPageLink;
}
	
return $toReturn;
</pre>
	<p>(The above is kinda Smalltalk: each line starts with an object reference, followed by the method to call. Each line is terminated by a &#8216;;&#8217;. <code>if</code>, <code>while</code>, <code>break</code> behave as you&#8217;d expect. Closures are defined between <code>[]</code>, with parameters passed in to the first value.)</p>
	<p>Notice that the script is written in a domain-specific language to handle the HTMLisms of the data being parsed. It contains a preamble that states which websites the script will need to visit (which are enforced by the VM), the input parameters, and the output format. A programmer using the scraper just has to properly call the script and use its return values. </p>
	<h1>Use Cases</h1>
	<p>Here are a few examples of scenarios that a portable scrapers could excel at:</p>
	<ol>
	<li>Securely finding the current balance on a bank account
  </li>
	<li>Querying the price of an item in an online store
  </li>
	<li>Querying a website for a list of upcoming events
  </li>
	<li>Querying <a href="http://octranspo.com/">OC Transpo</a> for a bus schedule
  </li>
	<li>Querying the <a href="http://www.octranspo.com/tps/jnot/startEN.oci">OC Transpo Travel Planner</a> for a travel plan between two points
</li>
</ol>
	<h1>Portable Scrapers vs. Plager</h1>
	<p>When I floated this idea to <a href="http://dmo.ca">dave0</a>, he rightly asked &#8220;how is this different from <a href="http://plagger.org/">Plagger</a>?&#8221; As far as I can tell, Plagger is intended to be a processor for RSS feeds: it sequentially runs a series of plugins on a blackboard that contains at least one RSS feed. Portable scrapers would differ in that:</p>
	<ol>
	<li>they would handle arbitrary data types. Whereas Plagger (only?) dumps an RSS feed, a portable scraper would hand arbitrary data back to the caller.
  </li>
	<li>Plagger is the application. As far as I can tell, it doesn&#8217;t accept parameters (outside those written into its config files), and it isn&#8217;t designed to pass data back to another process. In other words, plagger is intended to do the full computation, where a portable scraper is only intended to get data for further processing.
  </li>
	<li>Plagger modules are fully trusted. There is no programmatic mechanism to stop a Plagger plugin from leaking data (either through files or across the network).
</li>
</ol>
	<p>Comments?
</p>
]]></content:encoded>
			<wfw:commentRss>http://www.piepalace.ca/blog/2008/05/steal-this-idea-portable-scraping.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Miniposts 0.6.6 (Angry Armadillo)</title>
		<link>http://www.piepalace.ca/blog/2008/03/miniposts-066-angry-armadillo.html</link>
		<comments>http://www.piepalace.ca/blog/2008/03/miniposts-066-angry-armadillo.html#comments</comments>
		<pubDate>Sun, 09 Mar 2008 02:34:25 +0000</pubDate>
		<dc:creator>Erigami Scholey-Fuller</dc:creator>
				<category><![CDATA[MiniPosts2]]></category>
		<category><![CDATA[Self Absorbtion]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[RSS]]></category>
		<category><![CDATA[wordpress]]></category>

		<guid isPermaLink="false">http://www.piepalace.ca/blog/2008/03/miniposts-066-angry-armadillo.html</guid>
		<description><![CDATA[I&#8217;ve put together a new version of Miniposts2. It&#8217;s now Wordpress 2.5 compatible, and supports filtering miniposts from feeds. 

Along the way I found the migration doc useful. And I came to discover that Wordpress doesn&#8217;t really maintain any kind of backward compatibility between minor revisions. ]]></description>
			<content:encoded><![CDATA[I&#8217;ve put together a <a href="http://www.piepalace.ca/blog/wp-content/uploads/2008/03/miniposts2-066.zip">new version of Miniposts2</a>. It&#8217;s now Wordpress 2.5 compatible, and supports filtering miniposts from feeds. 

Along the way I found <a href="http://codex.wordpress.org/Migrating_Plugins_and_Themes">the migration doc</a> useful. And I came to discover that Wordpress doesn&#8217;t really maintain any kind of backward compatibility between minor revisions. ]]></content:encoded>
			<wfw:commentRss>http://www.piepalace.ca/blog/2008/03/miniposts-066-angry-armadillo.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lexers and Parsers in PHP</title>
		<link>http://www.piepalace.ca/blog/2007/12/lexers-and-parsers-in-php.html</link>
		<comments>http://www.piepalace.ca/blog/2007/12/lexers-and-parsers-in-php.html#comments</comments>
		<pubDate>Thu, 27 Dec 2007 22:56:28 +0000</pubDate>
		<dc:creator>Erigami Scholey-Fuller</dc:creator>
				<category><![CDATA[Self Absorbtion]]></category>
		<category><![CDATA[Tech]]></category>
		<category><![CDATA[bison]]></category>
		<category><![CDATA[documentation]]></category>
		<category><![CDATA[Extractor]]></category>
		<category><![CDATA[flex]]></category>
		<category><![CDATA[lex]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[yacc]]></category>

		<guid isPermaLink="false">http://www.piepalace.ca/blog/2007/12/lexers-and-parsers-in-php.html</guid>
		<description><![CDATA[I&#8217;ve spent the last hour looking into PHP_LexerGenerator and PHP_ParserGenerator. There seem to be docs for the lexer at the author&#8217;s PEAR instance (including an example), but I haven&#8217;t been able to turn up an example for the parser yet. It&#8217;s been ages since I looked at even yacc or bison, so I&#8217;m not quite [...]]]></description>
			<content:encoded><![CDATA[I&#8217;ve spent the last hour looking into <a href="http://pear.php.net/package/PHP_LexerGenerator">PHP_LexerGenerator</a> and <a href="http://pear.php.net/package/PHP_ParserGenerator">PHP_ParserGenerator</a>. There seem to be docs for the lexer at the <a href="http://pear.chiaraquartet.net/PHP_LexerGenerator/PHP_LexerGenerator/PHP_LexerGenerator.html">author&#8217;s PEAR instance</a> (including an <a href="http://pear.chiaraquartet.net/PHP_LexerGenerator/__examplesource/exsource_lopment_PHP_LexerGenerator_examples_TestLexer.plex_7a4e36cefd8665f7e414d16928266b41.html">example</a>), but I haven&#8217;t been able to turn up an example for the parser yet. It&#8217;s been ages since I looked at even yacc or bison, so I&#8217;m not quite ready to jump directly into trying to code something up.

I haven&#8217;t been able to find any examples of projects using either package. That may have something to do with the dearth of documentation. ]]></content:encoded>
			<wfw:commentRss>http://www.piepalace.ca/blog/2007/12/lexers-and-parsers-in-php.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
