Popularity Ranker¶
Popularity Ranker re-ranks results obtained via TF-IDF Ranker using information about the number of article views. The number of Wikipedia articles views is an open piece of information which can be obtained via Wikimedia REST API. We assigned a mean number of views for the period since 2017/11/05 to 2018/11/05 to each article in our English Wikipedia database enwiki20180211.
The inner algorithm of Popularity Ranker is a Logistic Regression classifier based on 3 features:
tfidf score of the article
popularity of the article
multiplication of two above features
The classifier is trained on SQuAD-v1.1 1 train set.
Quick Start¶
Before using the model make sure that all required packages are installed running the command:
python -m deeppavlov install en_ranker_pop_wiki
Building the model
from deeppavlov import build_model
ranker = build_model('en_ranker_pop_wiki', download=True)
Inference
result = ranker(['Who is Ivan Pavlov?'])
print(result[:5])
Output
>> ['Ivan Pavlov', 'Vladimir Bekhterev', 'Classical conditioning', 'Valentin Pavlov', 'Psychology']
Text for the output titles can be further extracted with WikiSQLiteVocab
class.
Configuration¶
Default ranker config is doc_retrieval/en_ranker_pop_wiki.json
Running the Ranker¶
Note
About 17 GB of RAM required.
Interacting¶
When interacting, the ranker returns document titles of the relevant documents.
Run the following to interact with the ranker:
python -m deeppavlov interact en_ranker_pop_wiki -d
Available Data and Pretrained Models¶
Available information about Wikipedia articles popularity is downloaded to ~/.deeppavlov/downloads/odqa/popularities.json
and pre-trained logistic regression classifier is downloaded to ~/.deeppavlov/models/odqa/logreg_3features.joblib
by default.