Sdictionary. How to create your own dictionaries
Starting version 1.1 Sdictionary contains all necessary components to
create your own dictionaries. Here I explain step-by-step procedure
compiling plain text file into binary .dct dictionary file. I suppose that Linux
users are clever enough and that is why I'll focus on Windows compilation
issues.
What you will need first?
You need Perl interpretor installed. Sdictionary works fine with
Active Perl versions 5.8 and higher. Go to
Active
State, then click 'Free Download'. Under web form click 'Next'
button to proceed (you can leave form empty). At download page choose
'Windows' then 'MSI' package. Download .msi package to your computer,
then double click to install it. After installation completed don't
forget to restart your computer.
Download latest ptksdict package
from here, you have to
download 'zip' archive. After download succeeded, just unzip the whole
archive into 'C:\Sdict' directory. If everything is OK, you'll see
'bin', 'lib' and so on directories in C:\Sdict. Later on in this
article I will refer to 'C:\Sdict' as your installation path.
Prepare source dictionary file
I assume you have some structured information you want to convert to
.dct file. OK. Find and open file
'C:\Sdict\share\dicts\sample1.txt'. You can use it as a template to
create your own dictionary.
File format in very simple. First of all, lines started with an '#' are
comments and just silently ignored. All non-ASCII characters must be
in UTF-8 encoding.
Then there are two main sections in the source file - header and articles.
Header defines such a parameters as dictionary languages, copyright
and so on. All header lines are between
<header> and </header> tags and in our
example are:
#
# Some text editors add fuzzy signature to the file, so
# leave with some comments here!
#
<header>
title = Sample 1 test dictionary - dictionary name;
copyright = GNU Public License - copyright information;
version = 0.1 - version;
w_lang = en - language for words;
a_lang = fi - language for articles. For further information
about language codes refer 'C:\Sdict\share\doc\iso639.htm' file;
# charset = ... - use if your source file is not in UTF-8 encoding.
</header>
Articles section is right after header. It contains all the
words in your dictionary. Each word/article pair allocates exactly one
line in the following format:
word___article
...
namely word itself, then three underscore characters
then article.
Compilation
As far as you've prepared source dictionary file it's time to compile
it. I hope you just modified template and your dictionary source file is
still 'sample1.txt' in 'C:\Sdict\share\dicts' directory. To make your file
compiled just launch 'compile-samples.bat' script from the same directory. It
produces compiled dictionary file 'sample1.dct'.
Congratulations! You did it! To learn more please refer documentation pages.
Back to Sdictionary project
|