vocabs¶
Concrete Vocab classes.
- class deeppavlov.vocabs.wiki_sqlite.WikiSQLiteVocab(load_path: str, join_docs: bool = True, shuffle: bool = False, **kwargs)[source]¶
Get content from SQLite database by document ids.
- Parameters
load_path – a path to local DB file
join_docs – whether to join extracted docs with ‘ ‘ or not
shuffle – whether to shuffle data or not
- join_docs¶
whether to join extracted docs with ‘ ‘ or not
- class deeppavlov.vocabs.typos.RussianWordsVocab(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, **kwargs)[source]¶
Implementation of
StaticDictionary
that builds data from https://github.com/danakt/russian-words/- Parameters
data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory
- dict_name¶
logical name of the dictionary
- alphabet¶
set of all the characters used in this dictionary
- words_set¶
set of all the words
- words_trie¶
trie structure of all the words
- class deeppavlov.vocabs.typos.StaticDictionary(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, dictionary_name: str = 'dictionary', **kwargs)[source]¶
Trie vocabulary used in spelling correction algorithms
- Parameters
data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory
dictionary_name – logical name of the dictionary
raw_dictionary_path – path to the source file with the list of words
- dict_name¶
logical name of the dictionary
- alphabet¶
set of all the characters used in this dictionary
- words_set¶
set of all the words
- words_trie¶
trie structure of all the words
- class deeppavlov.vocabs.typos.Wiki100KDictionary(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, **kwargs)[source]¶
Implementation of
StaticDictionary
that builds data from Wikitionary- Parameters
data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory
- dict_name¶
logical name of the dictionary
- alphabet¶
set of all the characters used in this dictionary
- words_set¶
set of all the words
- words_trie¶
trie structure of all the words