Seperate Database In To Two Files Using Identified Criteria

This will be a quick and simple job for anyone that works with large data sets. I have a csv file with around 45 to 46 million records. Most of the fields in the file are not needed and will be deleted. I have attached a small sample of the file.

The key item we will be looking at is the date of birth (Birth date) (DOB) column. There are records in the date of birth field that are clearly not correct and must be output to a separate file. We know these are not correct DOB’s just by looking at the data. The last two characters (day of birth)are 00 or the month of birth is listed as 00. Any record with a DOB that has two zeros for month, or day of birth needs to be output to a new file. It is possible to have two zeros in a row and have and a record be accurate as long as the zeros cross over between month and day or month and year. If you review the sample you will see rows in purple. These are the rows that would be output to another file.

Project summary:

1. Save only fields that are highlighted in yellow and delete all other fields from the file.

2. Split the file into two separate files. One will have only records that have two zeros in the month, day or year field and the other will have records that appear to have valid dates of birth.

Actually very very simple if you work with large files. Completed file should be saved as a txt file.

Please ask any questions.

Thank you for your consideration.

Leave a Reply

Your email address will not be published. Required fields are marked *