|
I have a bunch of pdfs from way back when - mostly news/magazine/... articles that I copied to a word document and printed as a pdf. Problem is I never made an overview so now I don't know what most of these are. What I'd like to do is to make an excel that's an overview. Basic structure of the pdfs is: code:
|
# ? Mar 10, 2017 21:10 |
|
|
# ? Apr 18, 2024 08:22 |
|
depending on the format and how consistent the PDF formats are that you generated way back when, you could use a simple python library like PDFMiner and then export your title, summary, and text into a CSV and import that into excel. but if your PDF structure isn't uniform you might need to do some hand editing on the final product. if you aren't familiar with python, or scripting in general, I'm unsure how you would proceed.
|
# ? Mar 11, 2017 06:43 |
|
I can't offer too much advice, but I had good luck with VBA in excel directly interfacing with PDFs for form filling. Surely the opposite can be done and some VBA string manipulation could spit your data directly into the spreadsheet however you see fit. I don't have much coding background, but was able to hack together other people's code from googling to do what I needed.
|
# ? Mar 11, 2017 21:12 |
|
I've used the PDFMiner tool before and all I could ever get it to do was dump the pdf structure, not the content of the fields of a fillable form, which is what I was hoping to get.
|
# ? Mar 14, 2017 17:11 |
|
Echoing the use of PDFMiner. It worked perfectly when i had to parse through 20k+ documents and organize them by certain keywords.
|
# ? Mar 16, 2017 10:24 |