deeppavlov.models.doc_retrieval¶

Ranking classes.

class deeppavlov.models.doc_retrieval.tfidf_ranker.TfidfRanker(vectorizer: deeppavlov.models.vectorizers.hashing_tfidf_vectorizer.HashingTfIdfVectorizer, top_n=5, active: bool = True, **kwargs)[source]¶

Rank documents according to input strings.

Parameters:	vectorizer – a vectorizer class top_n – a number of doc ids to return active – whether to return a number specified by `top_n` (`True`) or all ids (`False`)

top_n¶: a number of doc ids to return

vectorizer¶: an instance of vectorizer class

active¶: whether to return a number specified by top_n or all ids

index2doc¶: inverted doc_index

iterator¶: a dataset iterator used for generating batches while fitting the vectorizer

__call__(questions: List[str]) → Tuple[List[Any], List[float]][source]¶

Rank documents and return top n document titles with scores.

Parameters:	questions – list of queries used in ranking
Returns:	a tuple of selected doc ids and their scores

class deeppavlov.models.doc_retrieval.logit_ranker.LogitRanker(squad_model: deeppavlov.core.models.component.Component, batch_size: int = 50, **kwargs)[source]¶

Select best answer using squad model logits. Make several batches for a single batch, send each batch to the squad model separately and get a single best answer for each batch.

Parameters:	squad_model – a loaded squad model batch_size – batch size to use with squad model

squad_model¶: a loaded squad model

batch_size¶: batch size to use with squad model

__call__(contexts_batch: List[List[str]], questions_batch: List[List[str]]) → List[str][source]¶

Sort obtained results from squad reader by logits and get the answer with a maximum logit.

Parameters:	contexts_batch – a batch of contexts which should be treated as a single batch in the outer JSON config questions_batch – a batch of questions which should be treated as a single batch in the outer JSON config
Returns:	a batch of best answers