Database Extraction Project

Need a freelancer with Data Extraction experience.

The task requires the creation source code for a program that can extract information from the very large crawled snapshot of the Web posted on Amazon S3’s service: http://www.commoncrawl.org/data/accessing-the-data/

The program would take in input of a domain, run on an EC2 instance, and would produce a tab-separated text files with three columns: SourceURL\tAnchorText\tTargetURL where the TargetURL points to the domain of the input.

…

Leave a Reply