Hi,
Long time ago, I developed a PHP site crawler script that runs as a cronjob in my VPS server. This works fine even now with few exceptions.
Basically, there are several cronjob scripts I use to crawl a site. Now I would like to do some enhancements in it.
So, the existing project is like below.
There are two parts in my crawler.
1. Frontend – Which displays a summary of the crawl completed. There are simple PHP pages with embedded html to display output. I would like to change this to some simple php MVC pattern. so that the business logic & UI logic are separated & properly organised.
2. Backend – The cron job scripts which does the actual crawling. Currently the cron jobs run once a day or once an hr. It depends upon the configuration. There are several scripts running which does a specific job. For ex, I have a script which crawls the home page, a script which crawls the sub pages & extracts all internal & external links. Another script that crawls the external links to find out if they are online or not-reachable. Everything that is found will be updated in a database.
What needs to be done?
Frontend – should be changed to a proper MVC pattern to separate business logic & ui logic.
Backend –
1. The current scripts have errors. I have configured the cronjob to receive such errors via email. Wherever possible, I would like you to fix the errors.
2. Implement session management in cron job scripts. Allow a script to run multiple sessions at the same time do not affect the database or even crash. Currently, I have configured each script to run in higher intervals, for ex, once per day or once per hr. This is becoz when I configure for lesser time. For ex. Script1 is configured as once per 10 mins. Script1 process is already running & another Script1 process starts and crawls the same sites as other process. I hope session management & updation to database with session can rectify this problem.
3. Implement configurable parameters in it. Such parameter will be created in frontend & used in backend. Configurable parameters include how many links should be executed in one run, exclusion links list etc.
4. Implement Session management also in frontend. Currently it supports just 1 login. Allow multiple logins with roles such as Super-Admin, Admin etc.
After you successfully complete this project, I will have other enhancements to do which will also be assigned to you.
I always have 10/10 in reviews which shows that I am a trustworthy buyer. I have no issue with escrowing the amount before. Payment will be done in parts based on milestone completion. I shall setup milestones for the project.
