Retrieving Data From Websites
I am looking for a script that can run via cron at a user-specified frequency (capable of being adjusted easily by us) to visit the following websites and retrieve certain data (for example, RSS feed news) to my MySQL database. The programmer must be able to provide support, by phone, during regular business hours (9am to 5pm EST GMT-5).
The script must:
1) Update the database with news dated in the last 7 days (from each source) at a user-specified frequency,
2) Update different categories and sources following a user-specified order (including different frequencies for each category or source),
3) Identify and avoid retrieving news already in the database,
4) Process and categorize the incoming data with the following fields*:
Version title originalsite category Language copyright lastbuilddate
feedtype city rssnewstitle pubDate Rssnewslink subtopic titleandcategory
States Type rssnewsdescription country geolocation topic
5) Retrieve data from FeedBurner RSS feed,
6) Process podcast and video RSS feeds.
Websites required:
Google News for multiple countries/languages (with all categories):
http://www.google.com/news
http://news.google.com/news?ned=cn
http://news.google.com/news?ned=us
The script will retrieve the latest news from the different categories (World, Business, Sci-Tech, etc…). Furthermore, being able to handle multiple languages, including English, Simplified and Traditional Chinese, is mandatory. We would also want the script to handle a list of keywords and automatically collect news associated with each word using Google News’ search function. *Note: Each news entry is a unique record in the database.
CNN & CNN Money
http://www.cnn.com
http://money.cnn.com/
BBC
http://news.bbc.co.uk
Bloomberg
http://www.bloomberg.com/
VOA
http://www1.voanews.com/english/news/
Reuters
http://www.reuters.com/
CNBC
http://www.cnbc.com/id/28295763
FOX business
http://www.foxbusiness.com/
ABC business
http://www.abc.net.au/news
Yahoo! Finance
http://finance.yahoo.com
Skills needed include PHP and MySQL.