Sdictionary. How to create your own dictionaries


Starting version 1.1 Sdictionary contains all necessary components to create your own dictionaries. Here I explain step-by-step procedure compiling plain text file into binary .dct dictionary file. I suppose that Linux users are clever enough and that is why I'll focus on Windows compilation issues.

What you will need first?

You need Perl interpretor installed. Sdictionary works fine with Active Perl versions 5.8 and higher. Go to Active State, then click 'Free Download'. Under web form click 'Next' button to proceed (you can leave form empty). At download page choose 'Windows' then 'MSI' package. Download .msi package to your computer, then double click to install it. After installation completed don't forget to restart your computer.

Download latest ptksdict package from here, you have to download 'zip' archive. After download succeeded, just unzip the whole archive into 'C:\Sdict' directory. If everything is OK, you'll see 'bin', 'lib' and so on directories in C:\Sdict. Later on in this article I will refer to 'C:\Sdict' as your installation path.

Prepare source dictionary file

I assume you have some structured information you want to convert to .dct file. OK. Find and open file 'C:\Sdict\share\dicts\sample1.txt'. You can use it as a template to create your own dictionary.

File format in very simple. First of all, lines started with an '#' are comments and just silently ignored. All non-ASCII characters must be in UTF-8 encoding.
Then there are two main sections in the source file - header and articles.

Header defines such a parameters as dictionary languages, copyright and so on. All header lines are between <header> and </header> tags and in our example are:

#
# Some text editors add fuzzy signature to the file, so
# leave with some comments here!
#
<header>
title = Sample 1 test dictionary -  dictionary name;
copyright = GNU Public License - copyright information;
version = 0.1 - version;
w_lang = en - language for words;
a_lang = fi - language for articles. For further information
about language codes refer 'C:\Sdict\share\doc\iso639.htm' file;
# charset = ... - use if your source file is not in UTF-8 encoding.
</header>

Articles section is right after header. It contains all the words in your dictionary. Each word/article pair allocates exactly one line in the following format:

word___article
...
namely word itself, then three underscore characters then article.

Compilation

As far as you've prepared source dictionary file it's time to compile it. I hope you just modified template and your dictionary source file is still 'sample1.txt' in 'C:\Sdict\share\dicts' directory. To make your file compiled just launch 'compile-samples.bat' script from the same directory. It produces compiled dictionary file 'sample1.dct'.

Congratulations! You did it! To learn more please refer documentation pages.

Back to Sdictionary project


[Home] [TTL] [Unix] [Sdictionary] [ROW Programmer] [Symbian] [Misc] [News] [Search] [Contacts] [Guestbook]


Copyright (c) 1999-2024 Alexey Semenoff