Php Or Asp Parsing & Cron-job

Php Or Asp Parsing & Cron-job
We would like for someone to create us a PHP data harvester (basically a parser) that runs based on a server cronjob. If you think you can get the whole project done in ASP.net instead, feel free to (just make sure you mention which in your bid)

There is a real-estate site which contains record information we need to store and make searchable but because the complete data for these records are stored on a few different URLs we will need you to basically get the data out of those URLs and store them in a way where a search will return all corresponding data to a particular entry.

This project needs to be finished FAST.
I’ve been a developer myself and I know how long things usually take
to get done right, but we’re hoping for a quick turnaround. Please be sure to mention your understanding of the project in the PMB if you’re going to bid as this is very important.

Note:

I need to mention that the entire harvestor was previously written for us in C#(Desktop app) but that due to changes in the site’s structure that app no longer works. Many of the URLs for the parser are the same
and we would urge anyone looking into doing this project for us to give a quick review of the C# code so you can see how the process used to work.

url A has the form

http://www.oscn.net/applications/oscn/casesearch.asp
(example, already filled parameters)
http://www.oscn.net/applications/oscn/casesearch.asp?query=true&srch=0&web=true&db=Oklahoma&number=&iLAST=&iFIRST=&iMIDDLE=&iID=&iDOBL=&iDOBH=&SearchType=0&iDCPT=&iapcasetype=All&idccasetype=2&iDATEL=01/01/10&iDATEH=01/04/10&iCLOSEDL=&iCLOSEDH=&iDCType=0&iYear=&iNumber=&icitation=”

url B is:
http://www.oscn.net/applications/oscn/GetCaseInformation.asp?submitted=true&viewtype=caseGeneral&casemasterID=2557003&db=Oklahoma

The data is currently accessed as follows (basic example):

1. A search query containing parameters is made to a URL (url A), the page returns a table with a list of results
2. The data from these results needs to be parsed and then stored in a database
3. Further information about each record is then pulled from a second URL (url B) and this information is also
stored in the database, linked or in the corresponding records to those pulled from step 1.
4. If the information for this record is not available from url B or additional data is required, a search/parse
of additional URLs (c,d,e,f) is done (see below or attachment).

(Note that the dates between which the first query is made must always be a date before the current day)

What we want you to create:

>> PHP Harvestor that works with Cron to automate this process and store
the data each day (or the frequency we set it at) in a database

>> Single Admin Page (Password Protected)
Admin is able to set one of three paramters
1. How frequently the server-cron will run to collect this data
2. How many records are displayed per page in the front-end

>> Front-end (Search)
You will be provided with a pre-built open source membership script, all you need to do
is ensure that the people who are logged in are able to search the below records

They will be able to select which County they want to search in
They will be able to select the date range they want to see
They will be able to query any of the fields available
They will also be able to see the stored images in the results

Each search result entry will summarize some details but
clicking on it will display all of them.

This is the process we used to do that was manual (basically you will be automating this)

1. Harvestor collects data from provided URLS and Strips out duplicate records or records which are not needed

2.
Next we go to the Oklahoma county assessor’s page:
http://www.oklahomacounty.org/assessor/Searches/DefaultSearch.asp
Enter the persons Name from the county site. Remember Last name first!

*********************************************
If the names are common and do not agree – including middle initials you can check out the names of the mortgage banks listed both in the court documents (http://www.oscn.net/) and we can search using the name as Grantor on the property at: Http://www.oklahomacounty.org/coclerk/deeds/RofD_Search_03.asp . Remember Last name 1st!
Check out the bank names on the Document Type: Lis Pending are the foreclosures.

If that does not work you can check out the mortgage Grantee using the same Http://www.oklahomacounty.org/coclerk/deeds/RofD_Search_03.asp but switching the name to the Grantee field.
*********************************************

Click on the account number. This will open the main screen that we pull information from.

(Basically you will be having PHP do all these steps for us. Its more or less basic get/parsing)

The next section describes manual data-entry that used to have to be done. All of this can be done
on the server side if you can just make sure that the data stored is as per the instructions below:

A) save the picture from this page to a local server directory and update the database field on this record to
match that of the location of the image so that a search can return the image to.

— Populate the main database with the following information

B)Category: This should be set to Normal

C)Display Date – This is the filing date from the excel table.

D)Expiration Date – This is 90 days forward from the display date.

E)Case Number – This is the case number.

F)Filing Date – The Same as the Display date

G)Plaintiff – Use the name from the County Assessors office (Including middle names). Make sure that the capitalization is correct and not all capitals. Do not put the banks or other interested parties.

H)Defendant

I)Property Owner Name – This is the same as the Plaintiff

J)Mailing Address – Copy this from the County Assessors office site (on the top left side). Make sure that the capitalization is correct and not all capitals ie St and not ST for Street. Parse out the Zip code so that the 1st 5 digits and the last 4 are separated ie. xxxxx xxxx

K)Property Address – Copy this from the County Assessors office site (on the top right side) . Make sure that the capitalization is correct and not all capitals ie St and not ST for Street. Parse out the Zip code so that the 1st 5 digits and the last 4 are separated ie. xxxxx xxxx Make sure to select the proper county.If you don’t know the county you can check it in www.Mapquest.com

L)Property Owner’s Phone Number – Find this on www.whitepages.com. Use the ‘reverse lookup’ Use the street name & number from the mailing address. Make sure that you format the number correctly (xxx) xxx-xxxx when you copy and paste it into the profile. It is Ok to use the number if the names do not match exactly – as long as the last names are the same.

M)County Assessed Value – Copy this from the County Assessors office site. It is under ‘Property Value Information’ on the top line of the most recent year.

N)Purchase Price – Copy this from the County Assessors office site. It is under’Sales Documents/Deed History’ & will be the dollar amount of the most recent transaction.

O)Purchase Date – Copy this from the County Assessors office site. It is under’Sales Documents/Deed History’ & will be the date most recent transaction.

P)Year Built – Copy this from the County Assessors office site. It is under ‘Year Built’ on the bottom of the page.

Q)Bedrooms- Copy this from the County Assessors office site. Click on the button marked bldg #1 on the bottom left hand side of the screen. This will open a new page where the details of the house will be found.

R)Bathrooms – Copy this from the same page as the # of bedrooms

S)Square Footage – Copy this from the same page as the # of bedrooms

The information here must be completely accurate (as in your parser needs to be functioning correctly).

All information available can be found in the attached zip file.

Thank you and be free to ask any questions you may have.

Leave a Reply

Your email address will not be published. Required fields are marked *