deeppavlov.models.torch_bert

class deeppavlov.models.preprocessors.torch_transformers_preprocessor.TorchTransformersPreprocessor(vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, **kwargs)[source]

Tokenize text on subtokens, encode subtokens with their indices, create tokens and segment masks.

Parameters
  • vocab_file – A string, the model id of a predefined tokenizer hosted inside a model repo on huggingface.co or a path to a directory containing vocabulary files required by the tokenizer.

  • do_lower_case – set True if lowercasing is needed

  • max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens

max_seq_length

max sequence length in subtokens, including [SEP] and [CLS] tokens

tokenizer

instance of Bert FullTokenizer

__call__(texts_a: List[str], texts_b: Optional[List[str]] = None)Union[List[transformers.data.processors.utils.InputFeatures], Tuple[List[transformers.data.processors.utils.InputFeatures], List[List[str]]]][source]

Tokenize and create masks. texts_a and texts_b are separated by [SEP] token :param texts_a: list of texts, :param texts_b: list of texts, it could be None, e.g. single sentence classification task

Returns

batch of transformers.data.processors.utils.InputFeatures with subtokens, subtoken ids, subtoken mask, segment mask, or tuple of batch of InputFeatures and Batch of subtokens

class deeppavlov.models.preprocessors.torch_transformers_preprocessor.TorchTransformersNerPreprocessor(vocab_file: str, do_lower_case: bool = False, max_seq_length: int = 512, max_subword_length: Optional[int] = None, token_masking_prob: float = 0.0, provide_subword_tags: bool = False, subword_mask_mode: str = 'first', **kwargs)[source]

Takes tokens and splits them into bert subtokens, encodes subtokens with their indices. Creates a mask of subtokens (one for the first subtoken, zero for the others).

If tags are provided, calculates tags for subtokens.

Parameters
  • vocab_file – path to vocabulary

  • do_lower_case – set True if lowercasing is needed

  • max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens

  • max_subword_length – replace token to <unk> if it’s length is larger than this (defaults to None, which is equal to +infinity)

  • token_masking_prob – probability of masking token while training

  • provide_subword_tags – output tags for subwords or for words

  • subword_mask_mode – subword to select inside word tokens, can be “first” or “last” (default=”first”)

max_seq_length

max sequence length in subtokens, including [SEP] and [CLS] tokens

max_subword_length

rmax lenght of a bert subtoken

tokenizer

instance of Bert FullTokenizer

__call__(tokens: Union[List[List[str]], List[str]], tags: Optional[List[List[str]]] = None, **kwargs)[source]

Call self as a function.

class deeppavlov.models.preprocessors.torch_transformers_preprocessor.TorchBertRankerPreprocessor(vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, **kwargs)[source]

Tokenize text to sub-tokens, encode sub-tokens with their indices, create tokens and segment masks for ranking.

Builds features for a pair of context with each of the response candidates.

__call__(batch: List[List[str]])List[List[transformers.data.processors.utils.InputFeatures]][source]

Tokenize and create masks.

Parameters

batch – list of elements where the first element represents the batch with contexts and the rest of elements represent response candidates batches

Returns

list of feature batches with subtokens, subtoken ids, subtoken mask, segment mask.

class deeppavlov.models.torch_bert.torch_transformers_classifier.TorchTransformersClassifierModel(n_classes, pretrained_bert, one_hot_labels: bool = False, multilabel: bool = False, return_probas: bool = False, attention_probs_keep_prob: Optional[float] = None, hidden_keep_prob: Optional[float] = None, optimizer: str = 'AdamW', optimizer_parameters: Optional[dict] = None, clip_norm: Optional[float] = None, bert_config_file: Optional[str] = None, is_binary: Optional[bool] = False, num_special_tokens: Optional[int] = None, **kwargs)[source]

Bert-based model for text classification on PyTorch.

It uses output from [CLS] token and predicts labels using linear transformation.

Parameters
  • n_classes – number of classes

  • pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)

  • one_hot_labels – set True if one-hot encoding for labels is used

  • multilabel – set True if it is multi-label classification

  • return_probas – set True if return class probabilites instead of most probable label needed

  • attention_probs_keep_prob – keep_prob for Bert self-attention layers

  • hidden_keep_prob – keep_prob for Bert hidden layers

  • optimizer – optimizer name from torch.optim

  • optimizer_parameters – dictionary with optimizer’s parameters, e.g. {‘lr’: 0.1, ‘weight_decay’: 0.001, ‘momentum’: 0.9}

  • clip_norm – clip gradients by norm coefficient

  • bert_config_file – path to Bert configuration file (not used if pretrained_bert is key title)

  • is_binary – whether classification task is binary or multi-class

  • num_special_tokens – number of special tokens used by classification model

__call__(features: Dict[str, torch.tensor])Union[List[int], List[List[float]]][source]

Make prediction for given features (texts).

Parameters

features – batch of InputFeatures

Returns

predicted classes or probabilities of each class

train_on_batch(features: Dict[str, torch.tensor], y: Union[List[int], List[List[int]]])Dict[source]

Train model on given batch. This method calls train_op using features and y (labels).

Parameters
  • features – batch of InputFeatures

  • y – batch of labels (class id or one-hot encoding)

Returns

dict with loss and learning_rate values

class deeppavlov.models.torch_bert.torch_transformers_sequence_tagger.TorchTransformersSequenceTagger(n_tags: int, pretrained_bert: str, bert_config_file: Optional[str] = None, attention_probs_keep_prob: Optional[float] = None, hidden_keep_prob: Optional[float] = None, optimizer: str = 'AdamW', optimizer_parameters: dict = {'lr': 0.001, 'weight_decay': 1e-06}, learning_rate_drop_patience: int = 20, learning_rate_drop_div: float = 2.0, load_before_drop: bool = True, clip_norm: Optional[float] = None, min_learning_rate: float = 1e-07, use_crf: bool = False, **kwargs)[source]

Transformer-based model on PyTorch for text tagging. It predicts a label for every token (not subtoken) in the text. You can use it for sequence labeling tasks, such as morphological tagging or named entity recognition.

Parameters
  • n_tags – number of distinct tags

  • pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)

  • bert_config_file – path to Bert configuration file, or None, if pretrained_bert is a string name

  • attention_probs_keep_prob – keep_prob for Bert self-attention layers

  • hidden_keep_prob – keep_prob for Bert hidden layers

  • optimizer – optimizer name from torch.optim

  • optimizer_parameters – dictionary with optimizer’s parameters, e.g. {‘lr’: 0.1, ‘weight_decay’: 0.001, ‘momentum’: 0.9}

  • learning_rate_drop_patience – how many validations with no improvements to wait

  • learning_rate_drop_div – the divider of the learning rate after learning_rate_drop_patience unsuccessful validations

  • load_before_drop – whether to load best model before dropping learning rate or not

  • clip_norm – clip gradients by norm

  • min_learning_rate – min value of learning rate if learning rate decay is used

  • use_crf – whether to use Conditional Ramdom Field to decode tags

__call__(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray])Tuple[List[List[int]], List[numpy.ndarray]][source]

Predicts tag indices for a given subword tokens batch

Parameters
  • input_ids – indices of the subwords

  • input_masks – mask that determines where to attend and where not to

  • y_masks – mask which determines the first subword units in the the word

Returns

Label indices or class probabilities for each token (not subtoken)

train_on_batch(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray], y: List[List[int]], *args, **kwargs)Dict[str, float][source]
Parameters
  • input_ids – batch of indices of subwords

  • input_masks – batch of masks which determine what should be attended

  • args – arguments passed to _build_feed_dict and corresponding to additional input and output tensors of the derived class.

  • kwargs – keyword arguments passed to _build_feed_dict and corresponding to additional input and output tensors of the derived class.

Returns

dict with fields ‘loss’, ‘head_learning_rate’, and ‘bert_learning_rate’

class deeppavlov.models.torch_bert.torch_transformers_squad.TorchTransformersSquad(pretrained_bert: str, attention_probs_keep_prob: Optional[float] = None, hidden_keep_prob: Optional[float] = None, optimizer: str = 'AdamW', optimizer_parameters: Optional[dict] = None, bert_config_file: Optional[str] = None, learning_rate_drop_patience: int = 20, learning_rate_drop_div: float = 2.0, load_before_drop: bool = True, clip_norm: Optional[float] = None, min_learning_rate: float = 1e-06, batch_size: int = 10, **kwargs)[source]

Bert-based on PyTorch model for SQuAD-like problem setting: It predicts start and end position of answer for given question and context.

[CLS] token is used as no_answer. If model selects [CLS] token as most probable answer, it means that there is no answer in given context.

Start and end position of answer are predicted by linear transformation of Bert outputs.

Parameters
  • pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)

  • attention_probs_keep_prob – keep_prob for Bert self-attention layers

  • hidden_keep_prob – keep_prob for Bert hidden layers

  • optimizer – optimizer name from torch.optim

  • optimizer_parameters – dictionary with optimizer’s parameters, e.g. {‘lr’: 0.1, ‘weight_decay’: 0.001, ‘momentum’: 0.9}

  • bert_config_file – path to Bert configuration file, or None, if pretrained_bert is a string name

  • learning_rate_drop_patience – how many validations with no improvements to wait

  • learning_rate_drop_div – the divider of the learning rate after learning_rate_drop_patience unsuccessful validations

  • load_before_drop – whether to load best model before dropping learning rate or not

  • clip_norm – clip gradients by norm

  • min_learning_rate – min value of learning rate if learning rate decay is used

  • batch_size – batch size for inference of squad model

__call__(features_batch: List[List[transformers.data.processors.utils.InputFeatures]])Tuple[List[List[int]], List[List[int]], List[List[float]], List[List[float]], List[int]][source]

get predictions using features as input

Parameters

features_batch – batch of InputFeatures instances

Returns

answer start positions end_pred_batch: answer end positions logits_batch: answer logits scores_batch: answer confidences ind_batch: indices of paragraph pieces where the answer was found

Return type

start_pred_batch

train_on_batch(features: List[List[transformers.data.processors.utils.InputFeatures]], y_st: List[List[int]], y_end: List[List[int]])Dict[source]

Train model on given batch. This method calls train_op using features and labels from y_st and y_end

Parameters
  • features – batch of InputFeatures instances

  • y_st – batch of lists of ground truth answer start positions

  • y_end – batch of lists of ground truth answer end positions

Returns

dict with loss and learning_rate values

class deeppavlov.models.torch_bert.torch_bert_ranker.TorchBertRankerModel(pretrained_bert: Optional[str] = None, bert_config_file: Optional[str] = None, n_classes: int = 2, return_probas: bool = True, optimizer: str = 'AdamW', clip_norm: Optional[float] = None, optimizer_parameters: Optional[dict] = None, **kwargs)[source]

BERT-based model for interaction-based text ranking on PyTorch.

Linear transformation is trained over the BERT pooled output from [CLS] token. Predicted probabilities of classes are used as a similarity measure for ranking.

Parameters
  • pretrained_bert – pretrained Bert checkpoint path or key title (e.g. “bert-base-uncased”)

  • bert_config_file – path to Bert configuration file (not used if pretrained_bert is key title)

  • n_classes – number of classes

  • return_probas – set True if class probabilities are returned instead of the most probable label

  • optimizer – optimizer name from torch.optim

  • optimizer_parameters – dictionary with optimizer’s parameters, e.g. {‘lr’: 0.1, ‘weight_decay’: 0.001, ‘momentum’: 0.9}

__call__(features_li: List[List[transformers.data.processors.utils.InputFeatures]])Union[List[int], List[List[float]]][source]

Calculate scores for the given context over candidate responses.

Parameters

features_li – list of elements where each element contains the batch of features for contexts with particular response candidates

Returns

predicted scores for contexts over response candidates

train_on_batch(features_li: List[List[transformers.data.processors.utils.InputFeatures]], y: Union[List[int], List[List[int]]])Dict[source]

Train the model on the given batch.

Parameters
  • features_li – list with the single element containing the batch of InputFeatures

  • y – batch of labels (class id or one-hot encoding)

Returns

dict with loss and learning rate values