I need this to work on Windows XP & Vista.
I need a simple bot made. It needs to scrape google search results using a list of keywords from an external .txt file. All it needs to scrape are the URLs of the returned results (not the title or descriptions).
It needs to have support for proxies & it needs to be multi-threaded (I’m willing to NOT have it be mult-threaded if that means you can make bids that remain within my budget).
IT MUST SUPPORT UNICODE! What I mean is, if I have the following phrase in the .txt file: 輔導服務 it must search for that, and not turn it into ???? (which is what happens if it doesn’t support UNI code.
So, here is how the bot should work:
From the user interface, I select a .txt file (that has been saved in unicode format). In that text file will be a list of line separated phrases to search google for. Like this:
phrase 1
phrase 2
phrase 3
etc,
etc,
There should be a place in the user interface for me to select another .txt file that contains a list of proxies to use. The list of proxies in the .txt file will be in this format:
ip address:port
ip address:port
ip address:port
etc
etc
etc
Lastly, there should be a place in the user interface for me to select the number of simultaneous connections (threads) to use at the same time while scraping google.
The bot needs to scrape from the “advanced search” on google, as that can return a maximum of 100 results per page as compared to the regular 10 results per page you get when searching directly at “google.com”
The bot should scrape the maximum number of search results for each phrase entered. So if there are 3,000,000 results for a search phrase, it should get the first 1,000 results (as google won’t let you see more than 1000 results for any given keyword).
If only 600 results are returned, it should scrape all 600.
It should change proxies for EVERY query on EVERY simultaneous connection. It just needs to randomly select a proxy from the list of provided proxies in the .txt file that was select from the user interface.
It should also save the results to a .txt file called “results.txt” that is in the same folder as the .exe is running out of AS THEY ARE SCRAPED. So if the bot crashes or the computer crashes, whatever it has scraped thus far will still be saved to the .txt file.
I’d like to keep this project at as low a cost as possible.
Thanks for your bids.