I have a very large number of PDFs that I need to extract information from. There are roughly 20,000 PDFs that are all in roughly the same format. The PDFs have restrictions that need to be removed.
I need a script created that can read this large number of PDFs and store the information in an access database. Because of the large number of PDFs, speed is essential, as is a well-structured database. The script must keep track of which files it’s already read.
Attached is a sample PDF and an XLS that shows what information I want extracted.
