deeppavlov.models.classifiers¶
- class deeppavlov.models.classifiers.torch_classification_model.TorchTextClassificationModel(n_classes: int, kernel_sizes_cnn: List[int], filters_cnn: int, dense_size: int, dropout_rate: float = 0.0, embedding_size: Optional[int] = None, multilabel: bool = False, criterion: str = 'CrossEntropyLoss', embedded_tokens: bool = True, vocab_size: Optional[int] = None, return_probas: bool = True, **kwargs)[source]¶
Class implements torch model for classification of texts. Input can either be embedded tokenized texts OR indices of words in the vocabulary. Number of tokens is not fixed while the samples in batch should be padded to the same (e.g. longest) lengths.
- Parameters
n_classes – number of classes
kernel_sizes_cnn – list of kernel sizes of convolutions
filters_cnn – number of filters for convolutions
dense_size – number of units for dense layer
dropout_rate – dropout rate, after convolutions and between dense
embedding_size – size of vector representation of words
multilabel – is multi-label classification (if so, sigmoid activation will be used, otherwise, softmax)
criterion – criterion name from torch.nn
embedded_tokens – True, if input contains embedded tokenized texts; False, if input containes indices of words in the vocabulary
vocab_size – vocabulary size in case of embedded_tokens=False, and embedding is a layer in the Network
return_probas – whether to return probabilities or index of classes (only for multilabel=False)
- model¶
torch model itself
- epochs_done¶
number of epochs that were done
- criterion¶
torch criterion instance
- __call__(texts: List[ndarray], *args) Union[List[List[float]], List[int]] [source]¶
Infer on the given data.
- Parameters
texts – list of tokenized text samples
labels – labels
*args – additional arguments
- Returns
vector of probabilities to belong with each class or list of labels sentence belongs with
- Return type
for each sentence
- class deeppavlov.models.classifiers.cos_sim_classifier.CosineSimilarityClassifier(top_n: int = 1, save_path: Optional[str] = None, load_path: Optional[str] = None, **kwargs)[source]¶
Classifier based on cosine similarity between vectorized sentences
- Parameters
save_path – path to save the model
load_path – path to load the model
- __call__(q_vects: Union[csr_matrix, List]) Tuple[List[str], List[int]] [source]¶
Found most similar answer for input vectorized question
- Parameters
q_vects – vectorized questions
- Returns
Tuple of Answer and Score
- class deeppavlov.models.classifiers.proba2labels.Proba2Labels(max_proba: Optional[bool] = None, confidence_threshold: Optional[float] = None, top_n: Optional[int] = None, is_binary: bool = False, **kwargs)[source]¶
Class implements probability to labels processing using the following ways: choosing one or top_n indices with maximal probability or choosing any number of indices which probabilities to belong with are higher than given confident threshold
- Parameters
max_proba – whether to choose label with maximal probability
confidence_threshold – boundary probability value for sample to belong with the class (best use for multi-label)
top_n – how many top labels with the highest probabilities to return
- max_proba¶
whether to choose label with maximal probability
- confidence_threshold¶
boundary probability value for sample to belong with the class (best use for multi-label)
- top_n¶
how many top labels with the highest probabilities to return
- __call__(*args, **kwargs)[source]¶
Process probabilities to labels :param Every argument is a list of vectors with probability distribution:
- Returns
list of labels (only label classification) or list of lists of labels (multi-label classification), or list of the following lists (in multitask setting) for every argument