Very Simple Website Crawler

Very Simple Website Crawler
I need a script preferably in PHP. I am open to suggestion from experience coders for ASP/ASP.Net etc.

We need a very simple website crawler created to do the following:

In field one we input a list of 300 domain names (different every time)
In field two we input a list of keyword phrases (different every time)

Then the crawler takes the keyword phrases and goes to each domain name in the list and crawls the entire site located at each domain name for the keyword phrases. If the crawler finds the keyword phrases on any of the pages on the entered domain names it then provides us with the page’s title tag, META description, Exact URL and last modified time.

Example:

So if we:
Input the domain names xyz.com and samplesite.com
Input the keyword phase ‘ service provider ‘

The crawler will locate every page on the sites xyz.com and samplesite.com that contain the keyword phrase ‘ service provider ‘ in the content and displays the page’s title tags, meta description, Exact URL and last modified time in a formatted list.

Title Tag: Service Provider Page
META Desc: Page about service providers
URL: www.xyz.com/dogs
Last Modified: 01/28/2010

Title Tag: Service Provider Cats Page
META Desc: Page about cats service providers
URL: www.samplesite.com/cats
Last Modified: 01/08/2009

Please don’t hesitate to PM us with any additional questions. We needed this completed yesterday so please make sure your timeline is accurate cause you WILL be held to it.

We will be adding A LOT of features to this crawler as time progresses so you will have A LOT of future work with us if you create this correctly, quickly and at a decent price.

We reserve the sole rights to this script and it cannot be re-purposed or resold to anyone else.

You must send me a PM with any comments, questions and/or your experience and reference this project so I can clearly see which bids are fake and which are real.

Leave a Reply

Your email address will not be published. Required fields are marked *