DJVU TOOLS ---------- DESCRIPTION: These two scripts were written to automate adding OCR layer (TXTz) to DJVU files. WHY: I've seen different recommendation about OCR process, here are some advantages using this tools: - Merge words into lines, so I have found that in 'word' mode DJVU inflates about 40-42%, in 'line' mode (exactly what these tools do) 10-20% only, so final result will be smaller; - Minimal manual steps; - Tests for every phase, verbose reporting; - Intermediate reusable OCR text files in utf8 encoding, so you can edit it later on and recreate text layer with corrected text; - Fixes for wrong FRF files. REQUIREMENTS: Win32 OS (tested with Win98SE under Win4Lin); Perl 5.8, ActivePerl is known to work; FineReader 5,6,7 *YOU DON'T NEED* fully functional copy, expired copy (with 'Save' disabled) works just fine. INSTALL: Unpack the whole archive to some directory, 'C:\djvu_tools' for instance. Add 'C:\djvu_tools' to %PATH% environment. USAGE: Step 1. Getting set of TIFF files. If you already have scans, skip this phase. a) Copy your DJVU file (one file!) to some temp directory, lets say 'C:\TMP'; b) In the shell (cmd.exe) change directory to 'C:\TMP'; c) Type 'djvu2tiff'. At a result you'll get set of TIFF files converted from DJVU document. Step 2. OCR with FineReader. a) Select 'New Batch', 'Open Images', add all your TIFF files (select the latest first, then <Shift> - the first one), click 'Open' in dialog; b) Select language(s), in prefs select 'No hyphens'; c) Press 'Read All'; d) Close the batch. NOTE! Don't edit OCR!!! Now you have a set of FRF files mixed with TIFF ones in batch directory. Step 3. Generating OCR layer. a) Copy your DJVU file into FineReader batch directory (only one file!); b) In the shell change directory to the batch; c) Type 'djvu_add_ocr'. Script automatically performs all required actions, the final result will be the same DJVU file with additional text layer called 'TXTz'. LICENSE: GPl - General Public License. COPYRIGHT: Scripts written by Alexey Semenoff [http://swaj.net] Copyright (c) 2006. FRFGrab Copyright (c) Gencho <gencho AT yourwap DOT com> DjVuLibre [http://djvulibre.djvuzone.org] Copyright (c) 2002 Leon Bottou and Yann Le Cun. Copyright (c) 2001 AT&T.