Recently a client produced a directory at a meeting and said “I have this as a PDF, can you get the email addresses out of it for me?”
While this seems straightforward I was amazed at the misinformation that you get if you google “extract email addresses from PDF”. Most links were to sites wanting to sell you a software program. While no doubt these programs work well there is no need to buy anything if you just want to extract email addresses from a PDF.
How I achieved what my client wanted was very straightforward:
- Open directory.pdf in Adobe Acrobat®
- Under tools select export then select more formats from the list and then Text (Plain) from the expanded list.
- Acrobat will then create a file directory.txt and unless you tell it otherwise save it in the same directory as the source file.
- Using notepad++ or your favourite text editor open the text file you just created and press Ctrl +A then Ctrl+C to select and copy all the text.
- Go to emailhippo here, an online email address extractor, and paste the text you just copied into the dialogue box.
- Tell it to Extract and voila it shows you which email addresses it has extracted and how many.
- Decide which format you want and download your new list. Done.
No software to buy, easy and repeatable as often as you need to. Using notepad++ you can de-duplicate your new list or all sorts of other fancy stuff if you so desire.
Hope that this is useful.