Screenscraping

Dear Programmers

This is my new project: Daily cronjob to do:
– Login to website(url in config_table), save cookies
– Download link with emailadresses to be stored in email_table
(to be reviewed by administrator)
– If there are new emailadresses
-Send mail to Administrator so he may review these.
– Scrape 1 or more pages (urls in config_table).
– Filter and store links(urls ins scraped_table) in database.
– Compare new links to old links
– If new
– Fetch/scrape page using layout done with print.css
(allready in website)
– Sent page by mail to all reviewed emailadresses (email_table)
– Mark rows in scraped_table as Sent=1
– Stop

Use config_table for setting urls of pages to be scraped link to emailadresslist, etcetera.

That is it. I think if you done projects with screenscraping before, this must be piece of cake? Correct me, if I am wrong.

Hope to seen your bids soon. Regards, Eef

Leave a Reply

Your email address will not be published. Required fields are marked *