Popularity Ranker re-ranks results obtained via TF-IDF Ranker using information about the number of article views. The number of Wikipedia articles views is an open piece of information which can be obtained via Wikimedia REST API. We assigned a mean number of views for the period since 2017/11/05 to 2018/11/05 to each article in our English Wikipedia database enwiki20180211.
The inner algorithm of Popularity Ranker is a Logistic Regression classifier based on 3 features:
tfidf score of the article
popularity of the article
multiplication of two above features
The classifier is trained on SQuAD-v1.1 1 train set.
Before using the model make sure that all required packages are installed running the command:
python -m deeppavlov install en_ranker_pop_enwiki20180211.json
Building the model
from deeppavlov import build_model, configs ranker = build_model(configs.doc_retrieval.en_ranker_pop_enwiki20180211, download=True)
result = ranker(['Who is Ivan Pavlov?']) print(result[:5])
>> ['Ivan Pavlov', 'Vladimir Bekhterev', 'Classical conditioning', 'Valentin Pavlov', 'Psychology']
Text for the output titles can be further extracted with
Default ranker config is doc_retrieval/en_ranker_pop_enwiki20180211.json
Running the Ranker¶
About 17 GB of RAM required.
When interacting, the ranker returns document titles of the relevant documents.
Run the following to interact with the ranker:
python -m deeppavlov interact en_ranker_pop_enwiki20180211 -d
Available Data and Pretrained Models¶
Available information about Wikipedia articles popularity is downloaded to
and pre-trained logistic regression classifier is downloaded to
~/.deeppavlov/models/odqa/logreg_3features.joblib by default.