devel//sap

segomos

May 22, 2014

Pure Perl6 HTML to XHTML (HTML::Parser::XML)

Purpose of the Module

So, we want to do some web scraping in perl6 and we like to do that as cleanly as possible. Generally I like to use XPath or CSS Selectors to make that work. They're easy to write and they get me the data I'm looking for. To use those, we need to get a clean HTML document every time or we need to clean up our dirty HTML document and force it into an XML like structure.

This method could easily be rewritten in any language though this building block already exists for most languages.

Continue reading