Open Domain Question Answering Model on Wikipedia¶
Task definition¶
Open Domain Question Answering (ODQA) is a task to find an exact answer to any question in Wikipedia articles. Thus, given only a question, the system outputs the best answer it can find. The default ODQA implementation takes a batch of queries as input and returns the best answer.
Quick Start¶
The example below is given for basic ODQA config en_odqa_infer_wiki. Check what other ODQA configs are available and simply replace en_odqa_infer_wiki with the config name of your preference.
Before using the model make sure that all required packages are installed running the command:
python -m deeppavlov install en_odqa_infer_wiki
Training (if you have your own data)
from deeppavlov import train_evaluate_model_from_config
train_evaluate_model_from_config('en_ranker_tfidf_wiki', download=True)
train_evaluate_model_from_config('qa_squad2_bert', download=True)
Building
from deeppavlov import build_model
odqa = build_model('en_odqa_infer_wiki', download=True)
Inference
result = odqa(['What is the name of Darth Vader\'s son?'])
print(result)
Output:
>> Luke Skywalker
Languages¶
There are pretrained ODQA models for English and Russian languages in DeepPavlov.
Models¶
English ODQA version consists of the following components:
TF-IDF ranker (based on DrQA 1), which defines top-N most relevant paragraphs in TF-IDF index;
Binary Passage Retrieval 2 (BPR) ranker, which defines top-K most relevant in binary index;
a database of paragraphs (by default, from Wikipedia) which finds N + K most relevant paragraph text by IDs, defined by TF-IDF and BPR ranker;
Reading Comprehension component, which finds answers in paragraphs and defines answer confidences.
Russian ODQA version performs retrieval only with TF-IDF index.
Binary Passage Retrieval is resource-efficient the method of building a dense passage index. The dual encoder (with BERT or other Tranformer as backbone) is trained on question answering dataset (Natural Questions in our case) to maximize dot product of question and passage with answer embeddings and minimize otherwise. The question or passage embeddings are obtained the following way: vector of BERT CLS-token is fed into a dense layer followed by a hash function which turns dense vector into binary one.
Running ODQA¶
Note
About 22 GB of RAM required. It is possible to run on a 16 GB machine, but than swap size should be at least 8 GB.
Training¶
ODQA ranker and ODQA reader should be trained separately. Read about training the ranker here. Read about training the reader in our separate [reader tutorial]<SQuAD.ipynb#4.-Train-the-model-on-your-data>.
Interacting¶
When interacting, the ODQA model returns a plain answer to the user’s question.
Run the following to interact with English ODQA:
python -m deeppavlov interact en_odqa_infer_wiki -d
Run the following to interact with Russian ODQA:
python -m deeppavlov interact ru_odqa_infer_wiki -d
Configuration¶
The ODQA configs suit only model inferring purposes. For training purposes use the ranker configs and the [reader tutorial]<SQuAD.ipynb#4.-Train-the-model-on-your-data> accordingly.
There are several ODQA configs available:
Config |
Description |
Basic config for English language. Consists of of Binary Passage Retrieval, TF-IDF retrieval and reader. |
|
Basic config for Russian language. Consists of TF-IDF ranker and reader. |
|
Extended config for English language. Consists of of Binary Passage Retrieval, TF-IDF retrieval, popularity ranker and reader. |
Comparison¶
Scores for ODQA models:
Model |
Lang |
Dataset |
Number of paragraphs |
F1 |
EM |
RAM |
En |
Natural Questions |
200 |
41.7 |
33.8 |
10.4 |
|
200 |
41.7 |
33.8 |
10.4 |
|||
100 |
41.5 |
64.6 |
||||
Ru |
SDSJ Task B (dev) |
100 |
58.9 |
42.6 |
13.1 |
EM stands for “exact-match accuracy”. Metrics are counted for top 100 and top 200 paragraphs, extracted by retrieval module.