I expect this project to be quite easy for the person with the right skill set and programming knowledge.
I had a PHP web-scraping script made up which works through the command prompt (PHP CLI) to access a secure server, scrape through A LOT of pages and to store the scraped information in a local database.
Now, the script works fantastically well, but a single scrape can take up to 30 hours to complete and I need it to work MUCH, **MUCH** faster!!
I would like this script (I will obviously provide it to the winning bidder, but don’t ask for it ahead of time) to be turned into a multithreaded app in a fast multithreaded language such as C# or Java.
PLEASE note ahead of time that this script uses very complicated cookie handling from a difficult and finnicky website and 2 freelancers have alreday left me high and dry without managing to finish the work, so please be CERTAIN you can handle such complications before bidding.
If you can do this, please let me know through PMB how you envision your solution working and how long you will take to get it done. Also, since I have some REALLY bad experiences with some of my other freelancers, please write the message “I really read this” in your response to me so I know you really read this entire posting. If I don’t see those words written, your bid will automatically be ignored since it means you did not read my project posting well enough.
Please ask whatever questions you may need clarified and ONLY BID IF YOU CAN DO THIS – I have **ZERO TOLERANCE** for bad / incompetent freelancers – I need this job done. If you can do it, I am GREAT to work with, but if you don’t feel **100% SURE YOU CAN DO THIS**, then PLEASE don’t even bother bidding.
Extra Information:
The site this script scrapes from is a “https:” site which requires a user name and password, but I can create as many user-names as you need. Also, I have proxy IP’s that I rent monthly that can be used for this program if needed.
Basically, as I said, my one main concern is not getting banned from this site for excessive usage.
Thanks!
