Data Crawling And Ordering Task

The full explanation to this task would be provided to the bid winner, but the highlights should be enough for you to understand the task.

1. You must visit a crawlable HTML website and index <10.000 of its pages while reading the page source code.
2. In this source code there is information that you have to order. The informaiton is laid out in the following format:
<a href=”SiteName/Tadga.htm”>Lala</a></li><li><a href=”SiteName/Tallika.htm”>Tallika
And you must order Lala next to Tadga and Tallika next to Tallika, using the rule that word that follows the URL matches it.
3. Your script must handle the data in various language charactars (non Latin), and its adviseable to use UTF-8 in order to keep the data right.

Next to that, you must order this list together with another list Excel that has a matching keyword to the list you’re creating.

The final output should be an XLS file with the list fully ordered and the display complete.

Please only post if you have:
– Proper experience
– Positive track recoed
– Ability to completre such task with 24 hours.

Leave a Reply

Your email address will not be published. Required fields are marked *