
From an open source perspective PDF has good and bad points. It’s good in that it’s an open standard that’s freely implementable, but bad because it remains a proprietary format under the control of Adobe.
What’s little realized among Linux users is the sheer quantity of PDF tools available at the command line. Need to convert a PDF to HTML, text or postscript file? It’s easy when you know how.
Perhaps simplest and most useful tool is pdftotext, which is used like this:
pdftotext filename.pdf
This will output a .txt file with the same filename as the original, in the current directory.
Don’t expect miracles. Images are ignored and complex formatting can fox the converter. The resulting text files nearly always require clean-up of some kind.
However, using pdftotext is definitely easier than cutting and pasting straight from the original document, particularly considering that Acrobat Reader has a nasty habit of inserting paragraph breaks at the end of each line when you cut and paste from documents.
pdftohtml is slightly more advanced. It will convert PDF files to HTML and attempt to carry across elementary formatting as well as images. Using the -c command switch, which activates “complex document” mode, is a good idea:
pdftohtml -c filename.pdf
Again, it’s not perfect, and don’t think that you can simply create a batch job to convert PDF files before throwing them online. But to quickly make a PDF file more accessible then it’s a good choice.
There are other PDF conversion tools you can experiment with, and most come installed by default on Linux.

Jul 6, 03:46 pm
Interesting post. Two other PDF-related applications I use are PDF Download extension, which allows you to choose between viewing or downloading a PDF from within Firefox and PDFCreator, which allows you to easily save Word documents in PDF format.