Extracting eMail addresses from a PDF

Recently a client produced a directory at a meeting and said “I have this as a PDF, can you get the email addresses out of it for me?”

While this seems straightforward I was amazed at the misinformation that you get if you google “extract email addresses from PDF”. Most links were to sites wanting to sell you a software program. While no doubt these programs work well there is no need to buy anything if you just want to extract email addresses from a PDF.

How I achieved what my client wanted was very straightforward:

  1. Open directory.pdf in Adobe Acrobat®
  2. Under  tools select export then select more formats from the list and then Text (Plain) from the expanded list.
  3. Acrobat will then create a file directory.txt and unless you tell it otherwise save it in the same directory as the source file.
  4. Using notepad++ or your favourite text editor open the text file you just created and press Ctrl +A then Ctrl+C to select and copy all the text.
  5. Go to emailhippo here, an online email address extractor, and paste the text you just copied into the dialogue box.
  6. Tell it to Extract and voila it shows you which email addresses it has extracted and how many.
  7. Decide which format you want and download your new list. Done.

No software to buy, easy and repeatable as often as you need to. Using notepad++ you can de-duplicate your new list or all sorts of other fancy stuff if you so desire.

Hope that this is useful.