Need Scraping Code

Summary: This should be php script using Curl or other to scrape data and store into my database. The script will be run via cron job and, as you will see with the needs, will have to be scraped in a particular order.

——————————-
Scrape 1:
I need to have the following scraped into my database:
http://www.ufc.com/fighter/Weight_Class?filterQuery=
That shows all fighters in the UFC. There is pagination, so the scraper must paginate and retrieve all fighters and their information.

Information Needed:
NAME: First Name, Last Name, NickName (located under name)
RECORD: Wins, Losses, Draws
HEIGHT: both US metrics and Metric System
WIEGHT: both USE metrics and Metric System

DataTable: UFC_Fighters
Fields:
UFC_Fighter_ID – Unique ID Generated by MySQL
UFC_Fighter_FirstName – First Name of fighter (Example: Yoshihiro)
UFC_Fighter_LastName – Last Name of fighter (Example: Akiyama)
UFC_Fighter_NickName – Nickname of fighter (Example: Junior)
UFC_Fighter_Picture – leave this field alone
UFC_Fighter_Height – Height of fighter (Example: 5’10” (177 cm)
UFC_Fighter_Weight – Weight of fighter (Example: 185 LBS (84 kg)
UFC_Fighter_Wins – Recorded Wins (Example: 13)
UFC_Fighter_Losses – Recorded Losses (Example: 3)
UFC_Fighter_Draws – Recorded Draws (Example: 0)

All items examples are from the URL listed above for Yoshihiro Akiyama except for the Nickname (because he doesn’t have one). In that case, Jose Aldo is chosen for the example.

Database Summary:
If a fighter does NOT exist in the database, a new record is created. If a fighter DOES exist in the database, then the record must be updated with current information.

Scrape 2:
http://www.ufc.com/schedule/all/event
This page has a list of the events midway down the page. My database requires scraping of the following: UFC Event ID, UFC Event Description, UFC Event Date, UFC Event Location.

Database: UFC_Events
Fields:
UFC_Event_ID – This is the unique id for the event inserted and requires no action.
UFC_Event_Name – This is the shortened name of the event (Example1: UFC 123 | Example2: TUF12). Event slugs can be obtained from the link value for each event (i.e., where the middle section links to, like http://www.ufc.com/event/TUF12-Finale/fight is linked to from the second event)
UFC_Event_Slug – lower case version of UFC_Event_Name with spaces removed and ‘-‘ inserted. (Example1: ufc-123 | Example2: tuf-12)
UFC_Event_Description – This is the full event name (Example1: UFC 123: Rampage vs Machida | Example 2: The Ultimate Fighter Finale: Team GSP vs Team Koscheck.
UFC_Event_Location – This will be pulled from the details page of the event itself such as the http://www.ufc.com/event/UFC123/fight for the UFC 123 event. {Example: Auburn Hills, MI Palace of Auburn Hills)
UFC_Event_Date – This is the date of the event (example: Saturday November 20, 7/10PM PT/ET)
UFC_Event_Realtime – This is the datetime version of that.
UFC_Event_Summary – This is not to be used.

Scrape 3:
Pull the fight card and insert it into the database. This would pull all fighter matchups and tie them to the events already created in the database. An example would be UFC 123. The details page is: http://www.ufc.com/event/UFC123/fight. There is a fight card near the top, but in the source code it is closer to line 1012. That is the location to scrape the fighters. Each fighter would have to be scraped and then matched with their ID from the UFC_Fighters datatable.

Database: UFC_Matchups
UFC_FightOrder – This is the order in which the UFC fighters are presented. From the source code, this would be: ‘Fight 1, Fight 2, etc’. However – this is an integer. (Example: 1 = Fight 1)
UFC_Event_ID – This is the event ID from the UFC Events table and represents the ID of the event for the matchup.
UFC_SportsBook_ID – leave blank
UFC_Fighter1_ID – The ID of the fighter from the UFC_Fighters datatable. In this you would need to query the fighters name and return the ID (UFC_Fighter_ID) from the UFC_Fighters table
Fighter1_Summary – Leave blank
Fighter2_Summary – Leave blank
Fight_Analysis – Leave blank
Matchup_Prediction – Leave blank
UFC_Fighter2_ID – same as UFC_Fighter1_ID
UFC_Fighter1_Odds – Leave blank
UFC_Fighter2_Odds – Leave blank

Scrape 4:
This project will have you scrape the odds from the BetUS website to place into our datatable. This will keep the odds up to date. URL: http://www.betus.com/wageringengine/xmlfeed/lines.aspx?Sport=U.F.C. The odds are listed for ALL events, which means all events would need to be udpated with odds.

Database: UFC_ODDS
Fields:
UFC_Odd_ID – Unique ID, created automatically
UFC_Event ID – Event ID Looked up based upon the XML feed and cross referenced to UFC_Events table and UFC_Event_ID
Sportsbook_ID – Always set to 1 (for this sportsbook)
UFC_Fighter1_ID – Fighter ID looked up based upon the XML feed and cross referenced to UFC_Fighters and UFC_Fighter_ID
UFC_Fighter2_ID – Same as UFC_Fighter1_ID
UFC_Fighter1_Odds – Lookup of odds from XML. (Example: +155, -190, etc)
UFC_Fighter2_Odds – Same as UFC_Fighter1_Odds

NOTE-IMPORTANT: Please put >UFCScrape< into your bid description so that I know it is not an auto bid.

Leave a Reply

Your email address will not be published. Required fields are marked *