Based on neural Named Entity Recognition network. The NER component reproduces architecture from the paper Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition which is inspired by Bi-LSTM+CRF architecture from https://arxiv.org/pdf/1603.01360.pdf.
|Persons-1000 dataset with additional LOC and ORG markup||95.25|
Based on fuzzy Levenshtein search to extract normalized slot values from text. The components either rely on NER results or perform needle in haystack search.
Component for classification tasks (intents, sentiment, etc) on word-level. Shallow-and-wide CNN, Deep CNN, BiLSTM, BiLSTM with self-attention and other models are presented. The model also allows multilabel classification of texts. Several pre-trained models are available and presented in Table below.
|28 intents||DSTC 2||En||DSTC 2 emb||Accuracy||0.7732||0.7868||800 Mb|
|Wiki emb||0.9602||0.9593||8.5 Gb|
|7 intents||SNIPS-2017||DSTC 2 emb||F1||0.8685||–||800 Mb|
|Wiki emb||0.9811||–||8.5 Gb|
|Tfidf + SelectKBest + PCA + Wiki emb||0.9673||–||8.6 Gb|
|Wiki emb weighted by Tfidf||0.9786||–||8.5 Gb|
|Insult detection||Insults||Reddit emb||ROC-AUC||0.9271||0.8618||6.2 Gb|
|5 topics||AG News||Wiki emb||Accuracy||0.8876||0.9011||8.5 Gb|
|Sentiment||Twitter mokoron||Ru||RuWiki+Lenta emb w/o preprocessing||0.9972||0.9971||6.2 Gb|
|RuWiki+Lenta emb with preprocessing||0.7811||0.7749||6.2 Gb|
|RuSentiment||RuWiki+Lenta emb||F1||0.6393||0.6539||6.2 Gb|
|Intent||Yahoo-L31||Yahoo-L31 on ELMo pre-trained on Yahoo-L6||ROC-AUC||0.9269||–||700 Mb|
As no one had published intent recognition for DSTC-2 data, the comparison of the presented model is given on SNIPS dataset. The evaluation of model scores was conducted in the same way as in  to compare with the results from the report of the authors of the dataset. The results were achieved with tuning of parameters and embeddings trained on Reddit dataset.
Based on Hybrid Code Networks (HCNs) architecture from Jason D. Williams, Kavosh Asadi, Geoffrey Zweig, Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning – 2017. It allows to predict responses in goal-oriented dialog. The model is customizable: embeddings, slot filler and intent classifier can be switched on and off on demand.
Available pre-trained models:
|Dataset & Model||Valid turn accuracy||Test turn accuracy||Downloads|
|DSTC2, bot with slot filler & intents||0.5288||0.5248||8.5 Gb|
|DSTC2, bot with slot filler & embeddings & attention||0.5538||0.5551||8.5 Gb|
Other benchmarks on DSTC2 (can’t be directly compared due to dataset modifications):
|Dataset & Model||Test turn accuracy|
|DSTC2, Bordes and Weston (2016)||0.411|
|DSTC2, Perez and Liu (2016)||0.487|
|DSTC2, Eric and Manning (2017)||0.480|
|DSTC2, Williams et al. (2017)||0.556|
Dialogue agent predicts responses in a goal-oriented dialog and is able to handle multiple domains (pretrained bot allows calendar scheduling, weather information retrieval, and point-of-interest navigation). The model is end-to-end differentiable and does not need to explicitly model dialogue state or belief trackers.
Comparison of deeppavlov pretrained model with others:
|Dataset & Model||Valid BLEU||Test BLEU||Downloads|
|Kvret, KvretNet||0.1319||0.1328||10 Gb|
|Kvret, KvretNet, Mihail Eric et al. (2017)||–||0.132||–|
|Kvret, CopyNet, Mihail Eric et al. (2017)||–||0.110||–|
|Kvret, Attn Seq2Seq, Mihail Eric et al. (2017)||–||0.102||–|
|Kvret, Rule-based, Mihail Eric et al. (2017)||–||0.066||–|
Pipelines that use candidates search in a static dictionary and an ARPA language model to correct spelling errors.
About 4.4 GB on disc required for the Russian language model and about 7 GB for the English one.
|Correction method||Precision||Recall||F-measure||Speed (sentences/s)|
|Damerau Levenshtein 1 + lm||53.26||53.74||53.50||29.3|
|Brill Moore top 4 + lm||51.92||53.94||52.91||0.6|
|Hunspell + lm||41.03||48.89||44.61||2.1|
|Brill Moore top 1||41.29||37.26||39.17||2.4|
Based on LSTM-based deep learning models for non-factoid answer selection. The model performs ranking of responses or contexts from some database by their relevance for the given context.
Available pre-trained models for ranking:
|Dataset||Model config||Validation (Recall@1)||Test1 (Recall@1)||Downloads|
Available pre-trained models for paraphrase identification:
|Dataset||Model config||Val (accuracy)||Test (accuracy)||Val (F1)||Test (F1)||Val (log_loss)||Test (log_loss)||Downloads|
|Quora Question Pairs||paraphrase_ident_qqp||87.1||87.0||83.0||82.6||0.300||0.305||8134M|
|Quora Question Pairs||paraphrase_ident_qqp||87.7||87.5||84.0||83.8||0.287||0.298||8136M|
Comparison with other models on the InsuranceQA V1:
|Model||Validation (Recall@1)||Test1 (Recall@1)|
|Architecture II (HLQA(200) CNNQA(4000) 1-MaxPooling Tanh)||61.8||62.8|
|QA-LSTM basic-model(max pooling)||64.3||63.1|
Based on Reading Wikipedia to Answer Open-Domain Questions. The model solves the task of document retrieval for a given query.
|SQuAD-v1.1||doc_retrieval||enwiki (2018-02-11)||75.6||33 GB|
Based on R-NET: Machine Reading Comprehension with Self-matching Networks. The model solves the task of looking for an answer on a question in a given context (SQuAD task format).
All pre-trained models could be downloaded. Model for English language will download about 2.5 Gb and model for Russian about 5 Gb.
|Dataset||Model config||lang||EM (dev)||F-1 (dev)|
|SDSJ Task B||squad_ru||ru||60.62||80.04|
In the case when answer is not necessary present in given context we have squad_noans model. This model outputs empty string in case if there is no answer in context.
Based on character-based approach to morphological tagging Heigold et al., 2017. An extensive empirical evaluation of character-based morphological tagging for 14 languages. A state-of-the-art model for Russian and several other languages. Model takes as input tokenized sentences and outputs the corresponding sequence of morphological labels in UD format. The table below contains word and sentence accuracy on UD2.0 datasets. For more scores see full table.
|Dataset||Model||Word accuracy||Sent. accuracy||Download size (MB)|
|UD2.0 (Russian)||Pymorphy + russian_tagsets (first tag)||60.93||0.00|
|UD Pipe 1.2 (Straka et al., 2017)||93.57||43.04|
|UD2.0 (Czech)||UD Pipe 1.2 (Straka et al., 2017)||91.86||42.28|
|UD2.0 (English)||UD Pipe 1.2 (Straka et al., 2017)||92.89||55.75|
|UD2.0 (German)||UD Pipe 1.2 (Straka et al., 2017)||76.65||10.24|
The eCommerce bot intends to retrieve product items from catalog in sorted order. In addition, it asks an user to provide additional information to specify the search.
About 130 Mb on disc required for eCommerce bot with TfIdf-based ranker and 500 Mb for BLEU-based ranker.
Examples of some components¶
Run goal-oriented bot with Telegram interface:
python -m deeppavlov interactbot deeppavlov/configs/go_bot/gobot_dstc2.json -d -t <TELEGRAM_TOKEN>
Run goal-oriented bot with console interface:
python -m deeppavlov interact deeppavlov/configs/go_bot/gobot_dstc2.json -d
Run goal-oriented bot with REST API:
python -m deeppavlov riseapi deeppavlov/configs/go_bot/gobot_dstc2.json -d
Run slot-filling model with Telegram interface:
python -m deeppavlov interactbot deeppavlov/configs/ner/slotfill_dstc2.json -d -t <TELEGRAM_TOKEN>
Run slot-filling model with console interface:
python -m deeppavlov interact deeppavlov/configs/ner/slotfill_dstc2.json -d
Run slot-filling model with REST API:
python -m deeppavlov riseapi deeppavlov/configs/ner/slotfill_dstc2.json -d
Predict intents on every line in a file:
python -m deeppavlov predict deeppavlov/configs/classifiers/intents_snips.json -d --batch-size 15 < /data/in.txt > /data/out.txt
View video demo of deployment of a goal-oriented bot and a slot-filling model with Telegram UI.