vocabs

Concrete Vocab classes.

class deeppavlov.vocabs.wiki_sqlite.WikiSQLiteVocab(data_url: str, data_dir: str = '', **kwargs)[source]

Get content from SQLite database by document ids.

Parameters:
  • data_url – an URL where to download a DB from
  • data_dir – a directory where to save downloaded DB to
__call__(doc_ids: Union[typing.List[typing.List[typing.Any]], NoneType] = None, *args, **kwargs) → List[str][source]

Get the contents of files, stacked by space.

Parameters:doc_ids – a batch of lists of ids to get contents for
Returns:a list of contents
class deeppavlov.vocabs.typos.RussianWordsVocab(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, **kwargs)[source]

Implementation of StaticDictionary that builds data from https://github.com/danakt/russian-words/

Parameters:data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory
dict_name

logical name of the dictionary

alphabet

set of all the characters used in this dictionary

words_set

set of all the words

words_trie

trie structure of all the words

class deeppavlov.vocabs.typos.StaticDictionary(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, dictionary_name: str = 'dictionary', **kwargs)[source]

Trie vocabulary used in spelling correction algorithms

Parameters:
  • data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory
  • dictionary_name – logical name of the dictionary
  • raw_dictionary_path – path to the source file with the list of words
dict_name

logical name of the dictionary

alphabet

set of all the characters used in this dictionary

words_set

set of all the words

words_trie

trie structure of all the words

class deeppavlov.vocabs.typos.Wiki100KDictionary(data_dir: [<class 'pathlib.Path'>, <class 'str'>] = '', *args, **kwargs)[source]

Implementation of StaticDictionary that builds data from Wikitionary

Parameters:data_dir – path to the directory where the built trie will be stored. Relative paths are interpreted as relative to pipeline’s data directory
dict_name

logical name of the dictionary

alphabet

set of all the characters used in this dictionary

words_set

set of all the words

words_trie

trie structure of all the words