deeppavlov.models.classifiers

class deeppavlov.models.classifiers.torch_classification_model.TorchTextClassificationModel(n_classes: int, kernel_sizes_cnn: List[int], filters_cnn: int, dense_size: int, dropout_rate: float = 0.0, embedding_size: Optional[int] = None, multilabel: bool = False, criterion: str = 'CrossEntropyLoss', embedded_tokens: bool = True, vocab_size: Optional[int] = None, return_probas: bool = True, **kwargs)[source]

Class implements torch model for classification of texts. Input can either be embedded tokenized texts OR indices of words in the vocabulary. Number of tokens is not fixed while the samples in batch should be padded to the same (e.g. longest) lengths.

Parameters
  • n_classes – number of classes

  • kernel_sizes_cnn – list of kernel sizes of convolutions

  • filters_cnn – number of filters for convolutions

  • dense_size – number of units for dense layer

  • dropout_rate – dropout rate, after convolutions and between dense

  • embedding_size – size of vector representation of words

  • multilabel – is multi-label classification (if so, sigmoid activation will be used, otherwise, softmax)

  • criterion – criterion name from torch.nn

  • embedded_tokens – True, if input contains embedded tokenized texts; False, if input containes indices of words in the vocabulary

  • vocab_size – vocabulary size in case of embedded_tokens=False, and embedding is a layer in the Network

  • return_probas – whether to return probabilities or index of classes (only for multilabel=False)

model

torch model itself

epochs_done

number of epochs that were done

criterion

torch criterion instance

__call__(texts: List[ndarray], *args) Union[List[List[float]], List[int]][source]

Infer on the given data.

Parameters
  • texts – list of tokenized text samples

  • labels – labels

  • *args – additional arguments

Returns

vector of probabilities to belong with each class or list of labels sentence belongs with

Return type

for each sentence

train_on_batch(texts: List[List[ndarray]], labels: list) Union[float, List[float]][source]

Train the model on the given batch.

Parameters
  • texts – vectorized texts

  • labels – list of labels

Returns

metrics values on the given batch

class deeppavlov.models.classifiers.cos_sim_classifier.CosineSimilarityClassifier(top_n: int = 1, save_path: Optional[str] = None, load_path: Optional[str] = None, **kwargs)[source]

Classifier based on cosine similarity between vectorized sentences

Parameters
  • save_path – path to save the model

  • load_path – path to load the model

__call__(q_vects: Union[csr_matrix, List]) Tuple[List[str], List[int]][source]

Found most similar answer for input vectorized question

Parameters

q_vects – vectorized questions

Returns

Tuple of Answer and Score

fit(x_train_vects: Tuple[Union[csr_matrix, List]], y_train: Tuple[str]) None[source]

Train classifier

Parameters
  • x_train_vects – vectorized question for train dataset

  • y_train – answers for train dataset

Returns

None

load() None[source]

Load classifier parameters

save() None[source]

Save classifier parameters

class deeppavlov.models.classifiers.proba2labels.Proba2Labels(max_proba: Optional[bool] = None, confidence_threshold: Optional[float] = None, top_n: Optional[int] = None, is_binary: bool = False, **kwargs)[source]

Class implements probability to labels processing using the following ways: choosing one or top_n indices with maximal probability or choosing any number of indices which probabilities to belong with are higher than given confident threshold

Parameters
  • max_proba – whether to choose label with maximal probability

  • confidence_threshold – boundary probability value for sample to belong with the class (best use for multi-label)

  • top_n – how many top labels with the highest probabilities to return

max_proba

whether to choose label with maximal probability

confidence_threshold

boundary probability value for sample to belong with the class (best use for multi-label)

top_n

how many top labels with the highest probabilities to return

__call__(*args, **kwargs)[source]

Process probabilities to labels :param Every argument is a list of vectors with probability distribution:

Returns

list of labels (only label classification) or list of lists of labels (multi-label classification), or list of the following lists (in multitask setting) for every argument