Web Scraping (html & Tables) Using Ruby/nokogiri/csv

I need 2 html parse scripts to extract text (including tables) from a file listing HTML pages (URLs and local FS) and write out in CSV. The output will include URL/filename, title tags and standard and custom metadata (in one output file) and URL/filename and various nodes in another output file. I prefer ruby/nokogiri/csv, etc. I’ll furnish an initial node tree (the html isn’t consistently formatted) and a list of the metadata (again, not consistently populated). thanks!

Leave a Reply