vocabs

Concrete Vocab classes.

class deeppavlov.vocabs.wiki_sqlite.WikiSQLiteVocab(load_path: str, join_docs: bool = True, shuffle: bool = False, **kwargs)[source]

Get content from SQLite database by document ids.

Parameters
  • load_path – a path to local DB file

  • join_docs – whether to join extracted docs with ‘ ‘ or not

  • shuffle – whether to shuffle data or not

join_docs

whether to join extracted docs with ‘ ‘ or not

__call__(doc_ids: Optional[List[List[Any]]] = None, *args, **kwargs) List[Union[str, List[str]]][source]

Get the contents of files, stacked by space or as they are.

Parameters

doc_ids – a batch of lists of ids to get contents for

Returns

a list of contents / list of lists of contents

class deeppavlov.vocabs.typos.RussianWordsVocab(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, **kwargs)[source]

Implementation of StaticDictionary that builds data from https://github.com/danakt/russian-words/

Parameters

data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory

dict_name

logical name of the dictionary

alphabet

set of all the characters used in this dictionary

words_set

set of all the words

words_trie

trie structure of all the words

class deeppavlov.vocabs.typos.StaticDictionary(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, dictionary_name: str = 'dictionary', **kwargs)[source]

Trie vocabulary used in spelling correction algorithms

Parameters
  • data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory

  • dictionary_name – logical name of the dictionary

  • raw_dictionary_path – path to the source file with the list of words

dict_name

logical name of the dictionary

alphabet

set of all the characters used in this dictionary

words_set

set of all the words

words_trie

trie structure of all the words

class deeppavlov.vocabs.typos.Wiki100KDictionary(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, **kwargs)[source]

Implementation of StaticDictionary that builds data from Wikitionary

Parameters

data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory

dict_name

logical name of the dictionary

alphabet

set of all the characters used in this dictionary

words_set

set of all the words

words_trie

trie structure of all the words