# deeppavlov.models.spelling_correction¶

class deeppavlov.models.spelling_correction.brillmoore.ErrorModel(dictionary: deeppavlov.vocabs.typos.StaticDictionary, window: int = 1, candidates_count: int = 1, *args, **kwargs)[source]

Component that uses statistics based error model to find best candidates in a static dictionary. Based on An Improved Error Model for Noisy Channel Spelling Correction by Eric Brill and Robert C. Moore

Parameters: dictionary – a StaticDictionary object window – maximum context window size candidates_count – maximum number of replacement candidates to return for every token in the input
costs

logarithmic probabilities of character sequences replacements

dictionary

a StaticDictionary object

window

maximum context window size

candidates_count

maximum number of replacement candidates to return for every token in the input

__call__(data: Iterable[Iterable[str]], *args, **kwargs) → List[List[List[Tuple[float, str]]]][source]

Propose candidates for tokens in sentences

Parameters: data – batch of tokenized sentences batch of lists of probabilities and candidates for every token
fit(x: List[str], y: List[str])[source]

Calculate character sequences replacements probabilities

Parameters: x – words with spelling errors y – words without spelling errors
save()[source]

Save replacements probabilities to a file

load()[source]

Load replacements probabilities from a file

class deeppavlov.models.spelling_correction.levenshtein.LevenshteinSearcherComponent(words: Iterable[str], max_distance: int = 1, error_probability: float = 0.0001, *args, **kwargs)[source]

Component that finds replacement candidates for tokens at a set Damerau-Levenshtein distance

Parameters: words – list of every correct word max_distance – maximum allowed Damerau-Levenshtein distance between source words and candidates error_probability – assigned probability for every edit
max_distance

maximum allowed Damerau-Levenshtein distance between source words and candidates

error_probability

assigned logarithmic probability for every edit

vocab_penalty

assigned logarithmic probability of an out of vocabulary token being the correct one without changes

__call__(batch: Iterable[Iterable[str]], *args, **kwargs) → List[List[List[Tuple[float, str]]]][source]

Propose candidates for tokens in sentences

Parameters: batch – batch of tokenized sentences batch of lists of probabilities and candidates for every token
class deeppavlov.models.spelling_correction.electors.top1_elector.TopOneElector(*args, **kwargs)[source]

Component that chooses a candidate with highest base probability for every token

__call__(batch: List[List[List[Tuple[float, str]]]]) → List[List[str]][source]

Choose the best candidate for every token

Parameters: batch – batch of probabilities and string values of candidates for every token in a sentence batch of corrected tokenized sentences
class deeppavlov.models.spelling_correction.electors.kenlm_elector.KenlmElector(load_path: pathlib.Path, beam_size: int = 4, *args, **kwargs)[source]

Component that chooses a candidate with the highest product of base and language model probabilities

Parameters: load_path – path to the kenlm model file beam_size – beam size for highest probability search
lm

kenlm object

beam_size

beam size for highest probability search

__call__(batch: List[List[List[Tuple[float, str]]]]) → List[List[str]][source]

Choose the best candidate for every token

Parameters: batch – batch of probabilities and string values of candidates for every token in a sentence batch of corrected tokenized sentences