Find Many Footer Copyrights

Looking for a unique data scraping/data research job.

I’m trying to find the range of
1)headers
2)footers
3)copyrights

used on blogs and large sites.

I’m looking for variety and tallying which are most common

Looking for 3 lists

1) Header

take a screen shot of just the header and top nav.
and describe it in text.

e.g. on scriptlance

[Logo]
Post project, buyers, programmers, faq, forum, contact, rss, search[].

2) Footer eg:

From: scriptlance.com

Sitemap | RSS | Privacy Policy | Terms | Report Violations | Affiliates | FAQ | Forum | Contact Support

3) Example of 2 of the copyrights:

(a)Copyright (C) 2010 SomeCorp All rights reserved.

(b) Copyright © 2007-2010 TheCorp.com

(c)Copyright © 2001 – 2010
SomeBrand is a trade-mark of
SomeCorp

TALLY
(a) found 10 times in 200 sites
(b) found 50 times in 200 sites
(c) found 3 times in 200 sites.

Use the following 200 sites (may overlap)

For top viewed sites 100 (e.g. apple)
http://www.alexa.com/topsites

For blog/smaller use this list
http://technorati.com/blogs/top100

Leave a Reply

Your email address will not be published. Required fields are marked *