deeppavlov.models.bert

class deeppavlov.models.preprocessors.bert_preprocessor.BertPreprocessor(vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, **kwargs)[source]

Tokenize text on subtokens, encode subtokens with their indices, create tokens and segment masks.

Check details in bert_dp.preprocessing.convert_examples_to_features() function.

Parameters
  • vocab_file – path to vocabulary

  • do_lower_case – set True if lowercasing is needed

  • max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens

max_seq_length

max sequence length in subtokens, including [SEP] and [CLS] tokens

tokenizer

instance of Bert FullTokenizer

__call__(texts_a: List[str], texts_b: Optional[List[str]] = None)List[bert_dp.preprocessing.InputFeatures][source]

Call Bert bert_dp.preprocessing.convert_examples_to_features() function to tokenize and create masks.

texts_a and texts_b are separated by [SEP] token

Parameters
  • texts_a – list of texts,

  • texts_b – list of texts, it could be None, e.g. single sentence classification task

Returns

batch of bert_dp.preprocessing.InputFeatures with subtokens, subtoken ids, subtoken mask, segment mask.

class deeppavlov.models.preprocessors.bert_preprocessor.BertNerPreprocessor(vocab_file: str, do_lower_case: bool = False, max_seq_length: int = 512, max_subword_length: Optional[int] = None, token_masking_prob: float = 0.0, provide_subword_tags: bool = False, subword_mask_mode: str = 'first', **kwargs)[source]

Takes tokens and splits them into bert subtokens, encodes subtokens with their indices. Creates a mask of subtokens (one for the first subtoken, zero for the others).

If tags are provided, calculates tags for subtokens.

Parameters
  • vocab_file – path to vocabulary

  • do_lower_case – set True if lowercasing is needed

  • max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens

  • max_subword_length – replace token to <unk> if it’s length is larger than this (defaults to None, which is equal to +infinity)

  • token_masking_prob – probability of masking token while training

  • provide_subword_tags – output tags for subwords or for words

  • subword_mask_mode – subword to select inside word tokens, can be “first” or “last” (default=”first”)

max_seq_length

max sequence length in subtokens, including [SEP] and [CLS] tokens

max_subword_length

rmax lenght of a bert subtoken

tokenizer

instance of Bert FullTokenizer

__call__(tokens: Union[List[List[str]], List[str]], tags: Optional[List[List[str]]] = None, **kwargs)[source]

Call self as a function.

class deeppavlov.models.preprocessors.bert_preprocessor.BertRankerPreprocessor(vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, **kwargs)[source]

Tokenize text to sub-tokens, encode sub-tokens with their indices, create tokens and segment masks for ranking.

Builds features for a pair of context with each of the response candidates.

__call__(batch: List[List[str]])List[List[bert_dp.preprocessing.InputFeatures]][source]

Call BERT bert_dp.preprocessing.convert_examples_to_features() function to tokenize and create masks.

Parameters

batch – list of elemenents where the first element represents the batch with contexts and the rest of elements represent response candidates batches

Returns

list of feature batches with subtokens, subtoken ids, subtoken mask, segment mask.

class deeppavlov.models.preprocessors.bert_preprocessor.BertSepRankerPreprocessor(vocab_file: str, do_lower_case: bool = True, max_seq_length: int = 512, **kwargs)[source]

Tokenize text to sub-tokens, encode sub-tokens with their indices, create tokens and segment masks for ranking.

Builds features for a context and for each of the response candidates separately.

__call__(batch: List[List[str]])List[List[bert_dp.preprocessing.InputFeatures]][source]

Call BERT bert_dp.preprocessing.convert_examples_to_features() function to tokenize and create masks.

Parameters

batch – list of elemenents where the first element represents the batch with contexts and the rest of elements represent response candidates batches

Returns

list of feature batches with subtokens, subtoken ids, subtoken mask, segment mask for the context and each of response candidates separately.

class deeppavlov.models.preprocessors.bert_preprocessor.BertSepRankerPredictorPreprocessor(resps=None, resp_vecs=None, conts=None, cont_vecs=None, **kwargs)[source]

Tokenize text to sub-tokens, encode sub-tokens with their indices, create tokens and segment masks for ranking.

Builds features for a context and for each of the response candidates separately. In addition, builds features for a response (and corresponding context) text base.

Parameters
  • resps – list of strings containing the base of text responses

  • resp_vecs – BERT vector respresentations of resps, if is None features for the response base will be build

  • conts – list of strings containing the base of text contexts

  • cont_vecs – BERT vector respresentations of conts, if is None features for the response base will be build

__call__(batch: List[List[str]])List[List[bert_dp.preprocessing.InputFeatures]]

Call BERT bert_dp.preprocessing.convert_examples_to_features() function to tokenize and create masks.

Parameters

batch – list of elemenents where the first element represents the batch with contexts and the rest of elements represent response candidates batches

Returns

list of feature batches with subtokens, subtoken ids, subtoken mask, segment mask for the context and each of response candidates separately.

class deeppavlov.models.bert.bert_classifier.BertClassifierModel(*args, **kwargs)[source]

Bert-based model for text classification.

It uses output from [CLS] token and predicts labels using linear transformation.

Parameters
  • bert_config_file – path to Bert configuration file

  • n_classes – number of classes

  • keep_prob – dropout keep_prob for non-Bert layers

  • one_hot_labels – set True if one-hot encoding for labels is used

  • multilabel – set True if it is multi-label classification

  • return_probas – set True if return class probabilites instead of most probable label needed

  • attention_probs_keep_prob – keep_prob for Bert self-attention layers

  • hidden_keep_prob – keep_prob for Bert hidden layers

  • optimizer – name of tf.train.* optimizer or None for AdamWeightDecayOptimizer

  • num_warmup_steps

  • weight_decay_rate – L2 weight decay for AdamWeightDecayOptimizer

  • pretrained_bert – pretrained Bert checkpoint

  • min_learning_rate – min value of learning rate if learning rate decay is used

__call__(features: List[bert_dp.preprocessing.InputFeatures])Union[List[int], List[List[float]]][source]

Make prediction for given features (texts).

Parameters

features – batch of InputFeatures

Returns

predicted classes or probabilities of each class

train_on_batch(features: List[bert_dp.preprocessing.InputFeatures], y: Union[List[int], List[List[int]]])Dict[source]

Train model on given batch. This method calls train_op using features and y (labels).

Parameters
  • features – batch of InputFeatures

  • y – batch of labels (class id or one-hot encoding)

Returns

dict with loss and learning_rate values

deeppavlov.models.bert.bert_sequence_tagger.token_from_subtoken(units: tensorflow.Tensor, mask: tensorflow.Tensor)tensorflow.Tensor[source]

Assemble token level units from subtoken level units

Parameters
  • units – tf.Tensor of shape [batch_size, SUBTOKEN_seq_length, n_features]

  • mask

    mask of token beginnings. For example: for tokens

    [[[CLS] My, capybara, [SEP]], [[CLS] Your, aar, ##dvark, is, awesome, [SEP]]]

    the mask will be

    [[0, 1, 1, 0, 0, 0, 0], [0, 1, 1, 0, 1, 1, 0]]

Returns

Units assembled from ones in the mask. For the

example above this units will correspond to the following

[[My, capybara], [Your`, ``aar, is, awesome,]]

the shape of this tensor will be [batch_size, TOKEN_seq_length, n_features]

Return type

word_level_units

class deeppavlov.models.bert.bert_sequence_tagger.BertSequenceNetwork(*args, **kwargs)[source]

Basic class for BERT-based sequential architectures.

Parameters
  • keep_prob – dropout keep_prob for non-Bert layers

  • bert_config_file – path to Bert configuration file

  • pretrained_bert – pretrained Bert checkpoint

  • attention_probs_keep_prob – keep_prob for Bert self-attention layers

  • hidden_keep_prob – keep_prob for Bert hidden layers

  • encoder_layer_ids – list of averaged layers from Bert encoder (layer ids) optimizer: name of tf.train.* optimizer or None for AdamWeightDecayOptimizer weight_decay_rate: L2 weight decay for AdamWeightDecayOptimizer

  • encoder_dropout – dropout probability of encoder output layer

  • ema_decay – what exponential moving averaging to use for network parameters, value from 0.0 to 1.0. Values closer to 1.0 put weight on the parameters history and values closer to 0.0 corresponds put weight on the current parameters.

  • ema_variables_on_cpu – whether to put EMA variables to CPU. It may save a lot of GPU memory

  • freeze_embeddings – set True to not train input embeddings set True to not train input embeddings set True to not train input embeddings

  • learning_rate – learning rate of BERT head

  • bert_learning_rate – learning rate of BERT body

  • min_learning_rate – min value of learning rate if learning rate decay is used

  • learning_rate_drop_patience – how many validations with no improvements to wait

  • learning_rate_drop_div – the divider of the learning rate after learning_rate_drop_patience unsuccessful validations

  • load_before_drop – whether to load best model before dropping learning rate or not

  • clip_norm – clip gradients by norm

train_on_batch(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], *args, **kwargs)Dict[str, float][source]
Parameters
  • input_ids – batch of indices of subwords

  • input_masks – batch of masks which determine what should be attended

  • args – arguments passed to _build_feed_dict and corresponding to additional input and output tensors of the derived class.

  • kwargs – keyword arguments passed to _build_feed_dict and corresponding to additional input and output tensors of the derived class.

Returns

dict with fields ‘loss’, ‘head_learning_rate’, and ‘bert_learning_rate’

class deeppavlov.models.bert.bert_sequence_tagger.BertSequenceTagger(*args, **kwargs)[source]

BERT-based model for text tagging. It predicts a label for every token (not subtoken) in the text. You can use it for sequence labeling tasks, such as morphological tagging or named entity recognition. See deeppavlov.models.bert.bert_sequence_tagger.BertSequenceNetwork for the description of inherited parameters.

Parameters
  • n_tags – number of distinct tags

  • use_crf – whether to use CRF on top or not

  • use_birnn – whether to use bidirection rnn after BERT layers. For NER and morphological tagging we usually set it to False as otherwise the model overfits

  • birnn_cell_type – the type of Bidirectional RNN. Either lstm or gru

  • birnn_hidden_size – number of hidden units in the BiRNN layer in each direction

  • return_probas – set this to True if you need the probabilities instead of raw answers

__call__(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray])Union[List[List[int]], List[numpy.ndarray]][source]

Predicts tag indices for a given subword tokens batch

Parameters
  • input_ids – indices of the subwords

  • input_masks – mask that determines where to attend and where not to

  • y_masks – mask which determines the first subword units in the the word

Returns

Label indices or class probabilities for each token (not subtoken)

class deeppavlov.models.bert.bert_squad.BertSQuADModel(*args, **kwargs)[source]

Bert-based model for SQuAD-like problem setting: It predicts start and end position of answer for given question and context.

[CLS] token is used as no_answer. If model selects [CLS] token as most probable answer, it means that there is no answer in given context.

Start and end position of answer are predicted by linear transformation of Bert outputs.

Parameters
  • bert_config_file – path to Bert configuration file

  • keep_prob – dropout keep_prob for non-Bert layers

  • attention_probs_keep_prob – keep_prob for Bert self-attention layers

  • hidden_keep_prob – keep_prob for Bert hidden layers

  • optimizer – name of tf.train.* optimizer or None for AdamWeightDecayOptimizer

  • weight_decay_rate – L2 weight decay for AdamWeightDecayOptimizer

  • pretrained_bert – pretrained Bert checkpoint

  • min_learning_rate – min value of learning rate if learning rate decay is used

__call__(features: List[bert_dp.preprocessing.InputFeatures])Tuple[List[int], List[int], List[float], List[float]][source]

get predictions using features as input

Parameters

features – batch of InputFeatures instances

Returns

start, end positions, logits for answer and no_answer score

Return type

predictions

train_on_batch(features: List[bert_dp.preprocessing.InputFeatures], y_st: List[List[int]], y_end: List[List[int]])Dict[source]

Train model on given batch. This method calls train_op using features and labels from y_st and y_end

Parameters
  • features – batch of InputFeatures instances

  • y_st – batch of lists of ground truth answer start positions

  • y_end – batch of lists of ground truth answer end positions

Returns

dict with loss and learning_rate values

class deeppavlov.models.bert.bert_squad.BertSQuADInferModel(squad_model_config: str, vocab_file: str, do_lower_case: bool, max_seq_length: int = 512, batch_size: int = 10, lang='en', **kwargs)[source]

This model wraps BertSQuADModel to make predictions on longer than 512 tokens sequences.

It splits context on chunks with max_seq_length - 3 - len(question) length, preserving sentences boundaries.

It reassembles batches with chunks instead of full contexts to optimize performance, e.g.,:

batch_size = 5 number_of_contexts == 2 number of first context chunks == 8 number of second context chunks == 2

we will create two batches with 5 chunks

For each context the best answer is selected via logits or scores from BertSQuADModel.

Parameters
  • squad_model_config – path to DeepPavlov BertSQuADModel config file

  • vocab_file – path to Bert vocab file

  • do_lower_case – set True if lowercasing is needed

  • max_seq_length – max sequence length in subtokens, including [SEP] and [CLS] tokens

  • batch_size – size of batch to use during inference

  • lang – either en or ru, it is used to select sentence tokenizer

__call__(contexts: List[str], questions: List[str], **kwargs)Tuple[List[str], List[int], List[float]][source]

get predictions for given contexts and questions

Parameters
  • contexts – batch of contexts

  • questions – batch of questions

Returns

answer, answer start position, logits or scores

Return type

predictions

class deeppavlov.models.bert.bert_ranker.BertRankerModel(*args, **kwargs)[source]

BERT-based model for interaction-based text ranking.

Linear transformation is trained over the BERT pooled output from [CLS] token. Predicted probabilities of classes are used as a similarity measure for ranking.

Parameters
  • bert_config_file – path to Bert configuration file

  • n_classes – number of classes

  • keep_prob – dropout keep_prob for non-Bert layers

  • return_probas – set True if class probabilities are returned instead of the most probable label

__call__(features_li: List[List[bert_dp.preprocessing.InputFeatures]])Union[List[int], List[List[float]]][source]

Calculate scores for the given context over candidate responses.

Parameters

features_li – list of elements where each element contains the batch of features for contexts with particular response candidates

Returns

predicted scores for contexts over response candidates

train_on_batch(features_li: List[List[bert_dp.preprocessing.InputFeatures]], y: Union[List[int], List[List[int]]])Dict[source]

Train the model on the given batch.

Parameters
  • features_li – list with the single element containing the batch of InputFeatures

  • y – batch of labels (class id or one-hot encoding)

Returns

dict with loss and learning rate values

class deeppavlov.models.bert.bert_ranker.BertSepRankerModel(*args, **kwargs)[source]

BERT-based model for representation-based text ranking.

BERT pooled output from [CLS] token is used to get a separate representation of a context and a response. Similarity measure is calculated as cosine similarity between these representations.

Parameters
  • bert_config_file – path to Bert configuration file

  • keep_prob – dropout keep_prob for non-Bert layers

  • attention_probs_keep_prob – keep_prob for Bert self-attention layers

  • hidden_keep_prob – keep_prob for Bert hidden layers

  • optimizer – name of tf.train.* optimizer or None for AdamWeightDecayOptimizer

  • weight_decay_rate – L2 weight decay for AdamWeightDecayOptimizer

  • pretrained_bert – pretrained Bert checkpoint

  • min_learning_rate – min value of learning rate if learning rate decay is used

__call__(features_li: List[List[bert_dp.preprocessing.InputFeatures]])Union[List[int], List[List[float]]][source]

Calculate scores for the given context over candidate responses.

Parameters

features_li – list of elements where the first element represents the context batch of features and the rest of elements represent response candidates batches of features

Returns

predicted scores for contexts over response candidates

train_on_batch(features_li: List[List[bert_dp.preprocessing.InputFeatures]], y: Union[List[int], List[List[int]]])Dict[source]

Train the model on the given batch.

Parameters
  • features_li – list with two elements, one containing the batch of context features and the other containing the batch of response features

  • y – batch of labels (class id or one-hot encoding)

Returns

dict with loss and learning rate values

class deeppavlov.models.bert.bert_ranker.BertSepRankerPredictor(*args, **kwargs)[source]

Bert-based model for ranking and receiving a text response.

BERT pooled output from [CLS] token is used to get a separate representation of a context and a response. A similarity score is calculated as cosine similarity between these representations. Based on this similarity score the text response is retrieved provided some base with possible responses (and corresponding contexts). Contexts of responses are used additionaly to get the best possible result of retrieval from the base.

Parameters
  • bert_config_file – path to Bert configuration file

  • interact_mode – mode setting a policy to retrieve the response from the base

  • batch_size – batch size for building response (and context) vectors over the base

  • keep_prob – dropout keep_prob for non-Bert layers

  • resps – list of strings containing the base of text responses

  • resp_vecs – BERT vector respresentations of resps, if is None it will be build

  • resp_features – features of resps to build their BERT vector representations

  • conts – list of strings containing the base of text contexts

  • cont_vecs – BERT vector respresentations of conts, if is None it will be build

  • cont_features – features of conts to build their BERT vector representations

__call__(features_li)[source]

Get the context vector representation and retrieve the text response from the database.

Uses cosine similarity scores over vectors of responses (and corresponding contexts) from the base. Based on these scores retrieves the text response from the base.

Parameters

features_li – list of elements where elements represent context batches of features

Returns

text response with the highest similarity score and its similarity score from the response base