Php Crawler

Hi all,

i need a PHP crawler that do the following:

– Crawler have at least 3 important files: crawler.class , config.php , crawl.php (i prefer for every site a class (alexa, yahoo, bing, google))
– it should crawl websites in a specified format
– it writes all found data in the database
– crawl a site if timestamp in db is older than 7 days
– input list is a website in the format like this:
http://www.test.com
https://www.test1.com
http://www.test2.com
– HTTPS must be possible
– If a second scan runs and it detects a change, it dont delete the databse record. it creates a new record and only the newest is showed up.

What he need to crawl?

– Website Titel
– Meta Description
– Meta Keywords
– Server Banner (example Server: Apache/2.2.16 (FreeBSD) mod_hcgi/0.8.0 mod_ssl/2.2.16 OpenSSL/1.0.0c DAV/2)
– Alexa Rank (Traffic Rank 1 Month, World Traffic Rank, Review count, Average Load Time)
– Google (Pagerank, indexed Pages[site:www.test.com])
– Add wappalyzer and store the details also in our DB
– Whois Information (Nameservers, Owner, street, city)
– Ripe information (DNS Lookup ip, ripe database query)
– IPlookup of the Site

The Config File:
– possible way to activate deactivate classes ( Services )
– possible way to edit the Database configuration
– possible way to edit the expiring time of a domain
– my brain is bad today… so there could be more during the project

Example output PHP is needed

Leave a Reply

Your email address will not be published. Required fields are marked *