Open Domain Question Answering Skill on Wikipedia

Task definition

Open Domain Question Answering (ODQA) is a task to find an exact answer to any question in Wikipedia articles. Thus, given only a question, the system outputs the best answer it can find:

:: What is the name of Darth Vader's son?
>> Luke Skywalker

Languages

There are pretrained ODQA models for English and Russian languages in DeepPavlov DeepPavlov.

Models

The architecture of ODQA skill is modular and consists of two models, a ranker and a reader. The ranker is based on DrQA [1] proposed by Facebook Research and the reader is based on R-NET [2] proposed by Microsoft Research Asia and its implementation [3] by Wenxuan Zhou.

Running ODQA

Tensorflow-1.8.0 with GPU support is required to run this model.

About 16 GB of RAM required

Note

TensorFlow 1.8 with GPU support is required to run this skill.

About 16 GB of RAM required.

Training

ODQA ranker and ODQA reader should be trained separately. Read about training the ranker here. Read about training the reader in our separate reader tutorial.

Interacting

When interacting, the ODQA skill returns a plain answer to the user’s question.

Run the following to interact with English ODQA:

cd deeppavlov/
python deep.py interact deeppavlov/configs/odqa/en_odqa_infer_wiki.json -d

Run the following to interact with Russian ODQA:

cd deeppavlov/
python deep.py interact deeppavlov/configs/odqa/ru_odqa_infer_wiki.json -d

Configuration

The ODQA configs suit only model inferring purposes. For training purposes use the ranker configs and the reader configs accordingly.

Comparison

Scores for ODQA skill:

Model Dataset Wiki dump F1 EM
DeepPavlov SQuAD (dev) enwiki (2018-02-11) 28.0 -
DrQA [1] SQuAD (dev) enwiki (2016-12-21) - 27.1

EM stands for “exact-match accuracy”.