We are looking for someone to develop a script in python to scrape craigslist and grab the listing title, address (if available), images, the housing category, and housing listing description from the listings posted at http://denver.craigslist.org/hhh/. The script should then turn the information into the xml format shown in the attached document.
Some key aspects to note:
-the address will have to be geocoded into latitude and longitude.
-also housing category codes will have to be read and converted to our category codes (see attached for conversions)
-also description of the housing listing text (misc) may have to be broken up by length into sections. ie: the “{tabend nameoftab}” tags that can be seen in the attached document.