Ranking and paraphrase identification¶

This library component solves the tasks of ranking and paraphrase identification based on semantic similarity which is trained with siamese neural networks. The trained network can retrieve the response closest semantically to a given context from some database or answer whether two sentences are paraphrases or not. It is possible to build automatic semantic FAQ systems with such neural architectures.

Ranking¶

To use the model pre-trained on the InsuranceQA V1 [1] dataset for inference, one should run the following command:

python -m deeppavlov interact deeppavlov/configs/ranking/ranking_insurance_interact.json

Now a user can enter a text of context and get and responses:

:: how much to pay for auto insurance?
>> ['the cost of auto insurance be based on several factor include your driving record , claim history , type of vehicle , credit score where you live and how far you travel to and from work I will recommend work with an independent agent who can shop several company find the good policy for you', 'there be not any absolute answer to this question rate for auto insurance coverage can vary greatly from carrier to carrier and from area to area contact local agent in your area find out about coverage availablity and pricing within your area look for an agent that you be comfortable working with as they will be the first last point of contact in most instance', 'the cost of auto insurance coverage for any vehicle or driver can vary greatly thing that effect your auto insurance rate be geographical location , vehicle , age (s) of driver (s) , type of coverage desire , motor vehicle record of all driver , credit rating of all driver and more contact a local agent get a quote a quote cost nothing but will let you know where your rate will']

To train the model on the InsuranceQA V1 [1] dataset one should run the command:

python -m deeppavlov interact deeppavlov/configs/ranking/ranking_insurance.json

As an example of configuration file see ranking_insurance.json.

Paraphrase identification¶

To use the model pre-trained on the Quora Question Pairs [2] dataset for inference, one should run the following command:

python -m deeppavlov interact deeppavlov/configs/ranking/paraphrase_ident_qqp_interact.json

Now a user can enter two sentences and the model will make a prediction whether these sentences are paraphrases or not.

:: How can I be a good geologist?&What should I do to be a great geologist?
>> This is a paraphrase.

To train the model on the Quora Question Pairs [2] dataset one should run the command:

python -m deeppavlov interact deeppavlov/configs/ranking/paraphrase_ident_qqp.json

As an example of configuration file see paraphrase_ident_qqp.json.

References¶

[1]	(1, 2) https://github.com/shuzi/insuranceQA

[2]	(1, 2) https://www.kaggle.com/c/quora-question-pairs/data