Ranking and paraphrase identification

This library model solves the tasks of ranking and paraphrase identification based on semantic similarity which is trained with siamese neural networks. The trained network can retrieve the response closest semantically to a given context from some database or answer whether two sentences are paraphrases or not. It is possible to build automatic semantic FAQ systems with such neural architectures.

Training and inference models on predifined datasets

BERT Ranking

Before using models make sure that all required packages are installed running the command:

python -m deeppavlov install ranking_ubuntu_v2_torch_bert_uncased

Paraphrase identification

Paraphraser.ru dataset

Before using the model make sure that all required packages are installed running the command:

python -m deeppavlov install paraphraser_rubert

To train the model on the paraphraser.ru dataset one can use the following code in Python:

from deeppavlov import configs, train_model

para_model = train_model('paraphraser_rubert', download=True)

Paraphrase identification

train.csv: the same as for ranking.

valid.csv, test.csv: each line in the file contains context, response and label separated by the tab key. label is binary, i.e. 1 or 0 corresponding to the correct or incorrect response for the given context. Instead of response and context it can be simply two phrases which are paraphrases or non-paraphrases as indicated by the label.

Classification metrics on the valid and test dataset parts (the parameter metrics in the JSON configuration file) such as f1, acc and log_loss can be calculated.