Ad Scraper For Social Website

Need a script that will login to various fake plenty of fish accounts I provide and get the ads shown.
Then put the ad image and all other information into a mysql database for further processing.
Script must be able to optionally use proxies so I can see ads in different countries.

Requirements:

1. Must be a serverside webhosted script.
2. Must have an admin login screen.
This admin screen will be where I can input the following:
a. Listing of plentyoffish accounts (usernames / passwords)
b. How many times I want the script to login per day per account.
c. The amount of times I would like the script to refresh each page during a single login and scrape each ad per account.
d. The ability to create reports.
e. The ability to export reports.
f. The ability to purge all records.
3. Must be able to scrape the image of the ad and text of an ad and place them into different columns on a mysql database.
4. Must be able to click on an ad and count and record each hop the url takes you before landing on the final landing page.
**Must be able to hover my mouse over the web report to see the url of each hop a particular ad is redirecting through.
**See http://ezcl.org/spyadvanced.flv video (look at example begining at timestamp 4:37 in the video)**
5. Must be able to support multiple fake plentyoffish.com accounts.
6. Must be able to set configure the number of times a page is refreshed each time the “scraper program” is run per account.
7. Must record the cumalative times times a particular ad is seen per account.
8. Script must be able to use proxies during the scraping process so I can see ads in different countries. (proxies configured at an account level)
9. Script must record the last time a particular unique ad has been seen.
10. User must be able to define the amount of times the screen is refreshed each time per session to harvest ads.
11. User be able to display a scraped ad data report in a table format for either a single account, multiple accounts, or all accounts.
12. User must be able to setup a cron job to allow this product to log and scrap ads across all configured accounts on a timed basis: ie: every hour, every 3 hours, every 6 hours, every 12 hours, etc.
13. User must have the ability to export data.
14. User must have the ability to purge all collected data.
15 Web report must contain the following columns:
POF account: This represents the many Plenty of Fish Accounts that can be configured into this software and used to scrape ads.
Headline of Ad: The 1st major headline of the ad.
Description of Ad: The text of the ad.
Ad Image: The image used for the ad
Url Hops: The various redirects you are sent through before landing on the final page. Column consists of numbers that link out to each destinct redirect.
Last Seen: The last time this particular ad was seen from this particular account
Number of Times Seen: Number of times this ad was seen from this particular account.
Duration: Displays the cumulative time frame over which this particular exact ad has been seen.

16. Must be able to sort each column on web report individually by ascending or descending order. (ie: sort number of times seen column).
17. Script must run on a linux server.

Page is refreshed by invoking the link: http://www.plentyoffish.com/inbox.aspx

Flow logic of the project:

1.) User logs into the admin control area and inputs a list of predefined plenty of fish accounts.
2.) User sets the amount of times they would like the script to log into plenty of fish in a 24 hour period starting from a predetermined time. (every hour, every 2 hours, every 3 hours, every 6 hours, every 8 hours, every 12 hours, etc)
3.) user sets the amount of times they would like the script to refresh the homepage after login (thus showing different ads to be scraped).
4.) User “optionally” sets what proxy server addresses they would like to use when logging into plenyoffish.com for each particular account.
5.) User defines the link that should be used to “refresh” the homepage in a particular session. Default = http://www.plentyoffish.com/inbox.aspx (do no use browser refreshes)
6.) User has a button they can hit to start the process. This invokes the script.
7.) User has a button they can hit to stop the process. This stops the script.
8.) User has a menu or field that will show the status of the script. (IE: Running or Stopped)
9.) User can hit a button to generate a report similar to the report shown in the http://ezcl.org/spy4.flv video.

Flow logic of script:

1.) Script logs into plentyoffish.com with the username and password supplied in the admin console.
2.) The script looks for the 2 ads on the right hand side of the page: Illustrated in the video: spyadvanced.flv at timeframe: 0:14 secs.
3.) The script scrapes the 2 ads grabbing the image, headline of each ad, text of each ad and stores it in a mysql database.
4.) The script then clicks on each link and records the redirects and final destination url of each ads and stores that information in a mysql database for each ad.
5.) The script records each ad an impression and increments the counter for the number of times each particular ad is seen by the account that is logged.
6.) The script records the duration. If this is the 1st time this ad is seen, the script then records this as the intial time and will increment the duration for this unique ad when its seen again.
7.) The script updates the last seen variable for the unique ads.
8.) The script then refreshes the page by invoking the following url: http://www.plentyoffish.com/inbox.aspx or whatever url was configured in the admin panel for page refreshing.
9.) The script repeats processes 2 – 8 until the amount of configured refreshes is complete for that particular account session.
10.) The script completes the amount of configured refresh sessions and then logs out the configured user.
11.) The script logs the next configured user and repeats steps 1 – 10 until all configured users been used.
12.) The script repeats steps 1 – 11 again after the allocated time frame set in the admin console for rerunning the script.

For an idea of what I exactly want you can watch the following videos.

http://ezcl.org/spy4.flv <– details the software I would like created
http://ezcl.org/spyadvanced.flv <– details aspects of the plentyoffish.com website.

Leave a Reply

Your email address will not be published. Required fields are marked *