Extract the content from a PDF file

When was the last time you regretted not having the source file of a PDF file you had to edit? Probably not long ago. We  have all lost source files to PDF files, only to realize later that the amount of editing we have to do cannot be done using even Acrobat. And then there are times when we’ve simply inherited or received PDF files from others, with no clue of their source files. While we learn how not to lose our source files (!), this article describes ways of generating them, starting with the PDF files. A few months ago I contributed an article on how to extract the content from a PDF file, to Indus–the bimonthly newsletter of STC India chapter. The article can be read here.

I would like to present the last section here again, as the formatting in the article did not come out perfectly. The steps presented there may confuse you.

You must note that the content extraction capabilities of Acrobat X Standard, Acrobat X Professional, or Acrobat X Suite are quite enhanced. The conversions are quickly and accurately done by Acrobat X series of software.

How to extract content from secure PDF files?

Caution: Use this for personal PDF files, for which you own the copyright and have misplaced the password. Do ascertain your rights on the PDF files received from other sources.
If you have misplaced the source file, as well as, the password to your secured PDF file, use the following workaround to get your content back:
If printing is allowed on your PDF file:

  • Print a hard copy.
  • Scan at highest possible resolution in grayscale.
  • Extract content from the image PDF as described above.

If printing also is not allowed on your PDF file:

  • Unix/Linux users can generate a PS file from the PDF file using the pdf2ps command.
  • Revert back to PDF format using the ps2pdf command. Re-generated image PDF file does not have the same font information and is inflated in size, but is not secured anymore.
  • Extract content from this image PDF as described above.

Your suggestions, better tricks, requests for more workflows, and comments are most welcome.

Author: AshishG

See http://about.me/guptaashish

Let's talk

This site uses Akismet to reduce spam. Learn how your comment data is processed.