Software Disability Access List Archive

Announcing a command-line PDF to text converter

Posted by: Jamal Mazrui
Date Mailed: Thursday, April 22nd 1999 09:06 PM

I've posted the archive
http://www.empowermentzone.com/pdf2txt.zip

It is about 3 megabytes in size.  The included readme.txt file is below. 
Feel free to use, share, or improve this work, which is based on open source
components.

Regards,
Jamal



----------
PDF2TXT is a command-line system for converting one or more files from
portable document format (.pdf) to plain text (.txt).  It only works when
run from a DOS box under Windows.  Suggested installation is as follows.

Create a directory called c:\pdf2txt and unarchive all files from
pdf2txt.zip into it.

Add this directory to the DOS search path, e.g., within the c:\autoexec.bat
file.

Run the batch file pdf2txt.bat to convert files from .pdf to .txt format
(leaving the PDF input files intact).   If pdf2txt.bat is run without any
parameters (or with a /h or /? parameter), it presents a help screen that
explains command syntax.

Suppose the file manual.pdf is to be converted from portable document format to
plain text.  Enter the following command at the DOS prompt:

pdf2txt manual

Suppose all .pdf files in the current directory are to be converted to
text.  Enter the command:

pdf2txt *


Please note that the conversion process is relatively slow, that some
.pdf files produce no readable text, and that those that do usually
require additional manual formatting to be presentable.

PDF2TXT is a combination of free utilities and batch files, offered without
technical support.  Feedback is welcome, but a response should not be
expected.  I hope it makes the task of reading PDF files a little easier.

Jamal Mazrui
Email: empower@smart.net


ADDITIONAL NOTES

If the conversion to text is unsatisfactory with pdf2txt.bat, another
conversion option is to email the PDF as a binary attachment to the
address pdf2txt@sun.trace.wisc.edu or pdf2html@sun.trace.wisc.edu,
according to whether a reply is desired in text or HTML format. 
Conversion is also possible using a web based form at the address
http://access.adobe.com 
where one enters the Internet address of a PDF to be rendered in the
active browser.


pdf2txt.bat uses the batcon.exe utility for support with batch file
operations.  For the actual conversion, it calls pstotext.exe (Post
Script to Text), which in turn calls gs.exe (Ghost Script), as well as
dos4gw.exe for a DPMI (DOS Protected Mode Interface) environment.


The line length in converted .txt files may be longer than 80
characters, necessitating a document viewing program that wraps lines
appropriately.


pdf2txt.bat can be run from the directory in which it is installed if
one does not wish to include this directory in the DOS path.  There are
many files in that directory, however, so it is more difficult to review
what files were converted.


The only directories required in the DOS path when running pdf2txt.bat
from another directory are its installation directory and the directory
in which Windows is installed (generally c:\windows).


If pdf2txt.bat is interrupted before it completes processing, one or more of
the PDF files may be left with temporary .tmp extensions rather than the 
original .pdf extensions.


Below is the help screen displayed by running pdf2txt.bat
without parameters:

Converts a file in portable document format with a .pdf extension
to a text file with the same base name and a .txt extension. Supply
the base name of the input file as a parameter, e.g., 
pdf2txt basename 
produces the output file basename.txt from the input file
basename.pdf

Multiple files may be processed with a single command by using the
* and ? wild card characters. For example, the command 
pdf2txt *
will convert all .PDF files in the current directory. Remember that
a .pdf is automatically added to the file specification provided on
the command line, so that *.pdf is actually being processed in this
case.


Copyright 1999 by Access Success
----------
End of Document


-- 
TNET Mail-To-News Gateway Version - 1.6
For information about this gateway email programs@tnet.com
Dimenet Network Page Generation Copyright (c) 2004-2005 DIMENET and TNET Services, Inc.
Module: archive.php - Version: 2.50 - Build: July 24 2004 15:33:40 MST
Valid HTML 4.01!   Valid CSS!