I need a tool that will go out and scrape ads from various ad frame urls I specify and report back via web and an exported excel spreadsheet how many times the ads is seen on that specific ad frame.
The scraper needs to have the ability to be configured to scrape ad frames a specific amount of times in a single run. I expect this script to be able to be run via a cron job.
Itemized requirements:
1.) Allow the user the ability to input specific ad frames (urls) to scrape
2.) Allow the user to specify the amount of times an ad frame (url) is scraped in a single run
a.) For example in a single run through all the urls, each ad block can be configured to be refreshed and scraped 100, 200, 300, or however many times I’d like.
3.) Parse unique ads and count the cumulative amount of times they’ve appeared over the course of the following
a.) A specific day
b.) A specific set of days
c.) A week
d.) A month
e.) Since the scraper started counting
4.) Report both the ad, domain, and the associated landing page url for a particular ad.
Note, that grabbing the associated landing page for a particular ad may force us to click on the ad. Make this option configurable so that we have to option of clicking on the ad to get to the landing page or not clicking and only gathering ad information.
5.) Report the number of times a particular ad shows up in a particular position (examples below)
a.) Out of 500 queries on October 15th on site X, Penny Stock Ad showed up position #1 250 times, position #2 150 times and position #3 100 times
b.) Out of 2000 queries from October 15th to October 17 on site Y, Biz Op Ad showed up at position #1 1000 times and position #2 1000 times
6.) Allow the user to ad a description of the ad frame link that they will be querying. (example below)
a.) I want to enter the link: http://ads.tw.adsonar.com/adserving/getAds.jsp?previousPlacementIds=&placementId=1505691&pid=1990767&ps=-1&zw=627&zh=195&url=http%3A//www.dailyfinance.com/&v=5&dct=Business%20News%2C%20Stock%20Quotes%2C%20Investment%20Advice%20-%20DailyFinance&ref=http%3A//www.aol.com/
The above link corresponds to some text string I input like (AOL Finance Ad at bottom of the page) = Link Above
7.) The adscraper needs the ability to run at various intervals. Every hour, 2 hours, 3 hours, 30 mins, 15 mins, etc…..
8.) Be efficient and quick gathering scraping data as I will need it to scrape each ad at a minimum of 100 times per run. (open to suggestions to make the script fast)
9.) All time reporting needs to be in EST.
10.) Need the ability to pull data for any time period I like without lose of data integrity.
a.) Show all information between yesterday and today
11.) Be able to show all ads over a time period for a specific ad block.
a.) Show all ads and associated info that showed up on the ESPN Ad block over the past week.
b.) Show all ads and associated info that showed up on the AOL Finance Ad block over the past 2 days
I’ve included a file that contains many of the ad links I will be putting in the tool. You can put them in the tool and use them for testing.
Ad Scraper must run on a linux shared hosting server (Godaddy) and programmer must assist in the install of the script to signify delivery of project.