DJVU TOOLS
----------


DESCRIPTION:

	These two scripts were written to automate adding OCR layer
	(TXTz) to DJVU files. 


WHY:

	I've seen different recommendation about OCR process, here are
	some advantages using this tools:

	- Merge words into lines, so I have found that in 'word' mode
	  DJVU inflates about 40-42%, in 'line' mode (exactly what
	  these tools do) 10-20% only, so final result will be smaller;
	- Minimal manual steps;
	- Tests for every phase, verbose reporting;
    - Intermediate reusable OCR text files in utf8 encoding, so
	  you can edit it later on and recreate text layer with
	  corrected text;
    - Fixes for wrong FRF files.


REQUIREMENTS:

	Win32 OS (tested with Win98SE under Win4Lin);
	Perl 5.8, ActivePerl is known to work;
	FineReader 5,6,7 *YOU DON'T NEED* fully functional copy,
	expired copy (with 'Save' disabled) works just fine.


INSTALL:

	Unpack the whole archive to some directory, 'C:\djvu_tools'
	for instance. Add 'C:\djvu_tools' to %PATH% environment.


USAGE:
	Step 1. Getting set of TIFF files. If you already have scans,
	        skip this phase. 

		a) Copy your DJVU file (one file!) to some temp
		   directory, lets say 'C:\TMP';
		b) In the shell (cmd.exe) change directory to
		  'C:\TMP';
		c) Type 'djvu2tiff'.
		
		At a result you'll get set of TIFF files converted
		from DJVU document.

	Step 2. OCR with FineReader.
	     
		a) Select 'New Batch', 'Open Images', add all your
		   TIFF files (select the latest first, then <Shift> -
		   the first one), click 'Open' in dialog;
		b) Select language(s), in prefs select 'No hyphens';
		c) Press 'Read All';
		d) Close the batch.

		NOTE! Don't edit OCR!!!

		Now you have a set of FRF files mixed with TIFF ones
		in batch directory.

	Step 3. Generating OCR layer.
	     
		a) Copy your DJVU file into FineReader batch
		   directory (only one file!);
		b) In the shell change directory to the batch;
		c) Type 'djvu_add_ocr'.

		Script automatically performs all required actions,
		the final result will be the same DJVU file with
		additional text layer called 'TXTz'. 


LICENSE:

	GPl - General Public License.


COPYRIGHT:

        Scripts written by Alexey Semenoff [http://swaj.net]
	Copyright (c) 2006.

	FRFGrab Copyright (c) Gencho <gencho AT yourwap DOT com> 

	DjVuLibre [http://djvulibre.djvuzone.org]
	Copyright (c) 2002  Leon Bottou and Yann Le Cun.
	Copyright (c) 2001  AT&T.