Open Domain Question Answering Skill on Wikipedia¶
Open Domain Question Answering (ODQA) is a task to find an exact answer to any question in Wikipedia articles. Thus, given only a question, the system outputs the best answer it can find. The default ODQA implementation takes a batch of queries as input and returns 5 answers sorted via their score.
Before using the model make sure that all required packages are installed running the command:
python -m deeppavlov install en_odqa_infer_wiki
Training (if you have your own data)
from deeppavlov import configs from deeppavlov.core.commands.train import train_evaluate_model_from_config train_evaluate_model_from_config(configs.doc_retrieval.en_ranker_tfidf_wiki, download=True) train_evaluate_model_from_config(configs.squad.multi_squad_noans, download=True)
from deeppavlov import configs from deeppavlov.core.commands.infer import build_model odqa = build_model(configs.odqa.en_odqa_infer_wiki, load_trained=True)
result = odqa(['What is the name of Darth Vader\'s son?']) print(result)
>> Luke Skywalker
The architecture of ODQA skill is modular and consists of two models, a ranker and a reader. The ranker is based on DrQA  proposed by Facebook Research and the reader is based on R-NET  proposed by Microsoft Research Asia and its implementation  by Wenxuan Zhou.
About 24 GB of RAM required. It is possible to run on a 16 GB machine, but than swap size should be at least 8 GB.
When interacting, the ODQA skill returns a plain answer to the user’s question.
Run the following to interact with English ODQA:
python -m deeppavlov interact en_odqa_infer_wiki -d
Run the following to interact with Russian ODQA:
python -m deeppavlov interact ru_odqa_infer_wiki -d
Scores for ODQA skill:
|DrQA  enwiki20161221||-||27.1||-||-|
|R3  enwiki20161221||37.5||29.1||-|
EM stands for “exact-match accuracy”. Metrics are counted for top 5 and top 25 documents returned by retrieval module.