Looking for a unique data scraping/data research job.
I’m trying to find the range of
1)headers
2)footers
3)copyrights
used on blogs and large sites.
I’m looking for variety and tallying which are most common
Looking for 3 lists
1) Header
take a screen shot of just the header and top nav.
and describe it in text.
e.g. on scriptlance
[Logo]
Post project, buyers, programmers, faq, forum, contact, rss, search[].
2) Footer eg:
From: scriptlance.com
Sitemap | RSS | Privacy Policy | Terms | Report Violations | Affiliates | FAQ | Forum | Contact Support
3) Example of 2 of the copyrights:
(a)Copyright (C) 2010 SomeCorp All rights reserved.
(b) Copyright © 2007-2010 TheCorp.com
(c)Copyright © 2001 – 2010
SomeBrand is a trade-mark of
SomeCorp
TALLY
(a) found 10 times in 200 sites
(b) found 50 times in 200 sites
(c) found 3 times in 200 sites.
Use the following 200 sites (may overlap)
For top viewed sites 100 (e.g. apple)
http://www.alexa.com/topsites
For blog/smaller use this list
http://technorati.com/blogs/top100