Simple Scraper Script (php Or Cgi Perl)

This project is to create a simple scraper script that can be run on a LAMP server either manually from a web browser and also scheduled as a CRON job.

The database should be flatfile that can be easily edited to add and remove URLs for the script to visit. The language either PHP or CGI Perl. PHP5 is supported.

When executed, the script will visit URLs in a flatfile database. For each URL, the script will search the page content for the word “temporarily” (without the quotes). If this word is NOT found on a URL’s page, the script will include this URL in an automated e-mail sent at the end of the process. If this keyword is found on all URLs, the e-mail will simply say “No changes.”

This is an example URL:

http://www.funexpress.com/ui/search/processRequest.do?Ntt=IN-25%2F5234&requestURI=searchMain&Ntk=all&Ntx=mode%2Bmatchallpartial&N=0&x=10&y=22

All URLs that will be used in this script will be from the same site, and all pages have the same structure, however the URL format may sometimes be different.

FTP access is not available for this project.

Leave a Reply

Your email address will not be published. Required fields are marked *