deeppavlov.models.multitask_bert

class deeppavlov.dataset_readers.multitask_reader.MultiTaskReader[source]

Class to read several datasets simultaneuosly

class deeppavlov.dataset_iterators.multitask_iterator.MultiTaskIterator(data: dict, tasks: dict)[source]

Class merges data from several dataset iterators. When used for batch generation batches from merged dataset iterators are united into one batch. If sizes of merged datasets are different smaller datasets are repeated until their size becomes equal to the largest dataset.

Parameters
  • data – dictionary which keys are task names and values are dictionaries with fields "train", "valid", "test".

  • tasks – dictionary which keys are task names and values are init params of dataset iterators.

data

dictionary of data with fields “train”, “valid” and “test” (or some of them)

gen_batches(batch_size: int, data_type: str = 'train', shuffle: Optional[bool] = None)Iterator[Tuple[tuple, tuple]][source]

Generate batches and expected output to train neural networks. Batches from task iterators are united into one batch. Every element of the largest dataset is used once whereas smaller datasets are repeated until their size is equal to the largest dataset.

Parameters
  • batch_size – number of samples in batch

  • data_type – can be either ‘train’, ‘test’, or ‘valid’

  • shuffle – whether to shuffle dataset before batching

Yields

a tuple of a batch of inputs and a batch of expected outputs. Inputs and outputs are tuples. Element of inputs or outputs is a tuple which elements are x values of merged tasks in the order tasks are present in tasks argument of __init__ method.

get_instances(data_type: str = 'train')[source]

Returns a tuple of inputs and outputs from all datasets. Lengths of inputs and outputs are equal to the size of the largest dataset. Smaller datasets are repeated until their sizes are equal to the size of the largest dataset.

Parameters

data_type – can be either ‘train’, ‘test’, or ‘valid’

Returns

a tuple of all inputs for a data type and all expected outputs for a data type

class deeppavlov.models.multitask_bert.multitask_bert.MultiTaskBert(*args, **kwargs)[source]

The component for multi-task BERT. It builds the BERT body, launches building of BERT heads.

The component aggregates components implementing BERT heads. The head components are called tasks. __call__ and train_on_batch methods of MultiTaskBert are used for inference and training of BERT heads. BERT head components, which are derived from MTBertTask, can be used only inside this class.

One training iteration consists of one train_on_batch call for every task.

If inference_task_names is not None, then the component is created for training. Otherwise, the component is created for inference. If component is created for inference, several tasks can be run simultaneously. For explanation see parameter inference_task_names description.

Parameters
  • tasks – a dictionary. Task names are dictionary keys and objects of MTBertTask subclasses are dictionary values. Task names are used as variable scopes in computational graph so it is important to use same names in multi-task BERT train and inference configuration files.

  • bert_config_file – path to BERT configuration file

  • pretrained_bert – pre-trained BERT checkpoint

  • attention_probs_keep_prob – keep_prob for BERT self-attention layers

  • hidden_keep_prob – keep_prob for BERT hidden layers

  • body_learning_rate – learning rate of BERT body

  • min_body_learning_rate – min value of body learning rate if learning rate decay is used

  • learning_rate_drop_patience – how many validations with no improvements to wait

  • learning_rate_drop_div – the divider of the learning rate after learning_rate_drop_patience unsuccessful validations

  • load_before_drop – whether to load best model before dropping learning rate or not

  • clip_norm – clip gradients by norm

  • freeze_embeddings – set to False to train input embeddings

  • inference_task_names

    names of tasks on which inference is done. If this parameter is provided, the component is created for inference, else the component is created for training.

    If inference_task_names is a string, then it is a name of the task called separately from other tasks (in individual tf.Session.run call).

    If inference_task_names is a list, then elements of this list are either strings or lists of strings. You can combine these options. For example, ["task_name1", ["task_name2", "task_name3"], ["task_name4", "task_name5"]].

    If an element of inference_task_names list is a string, the element is a name of the task that is computed when __call__ method is called.

    If an element of the inference_task_names parameter is a list of strings ["task_name1", "task_name2", ...], then tasks "task_name1", "task_name2" and so on are run simultaneously in tf.Session.run call. This option is available if tasks "task_name1", "task_name2" and so on have common inputs. Despite the fact that tasks share inputs, if positional arguments are used in methods __call__ and train_on_batch, all arguments are passed individually. For instance, if "task_name1", "task_name2", and "task_name3" all take an argument with name x in the model pipe, then the __call__ method takes arguments (x, x, x).

  • in_distribution

    The distribution of variables listed in the "in" config parameter between tasks. in_distribution can be None if only 1 task is called. In that case all variables listed in "in" are arguments of 1 task.

    in_distribution can be a dictionary of int. If that is the case, then keys of in_distribution are task names and values are numbers of variables from "in" parameter of config which are inputs of corresponding task. The variables in "in" parameter have to be in the same order the tasks are listed in in_distribution.

    in_distribution can be a dictionary of lists of str. Strings are names of variables from "in" configuration parameter. If "in" parameter is a list, then in_distribution works the same way as when in_distribution is dictionary of int. Values of in_distribution, which are lists, are replaced by their lengths. If "in" parameter in component config is a dictionary, then the order of strings in in_distribution values has to match the order of arguments of train_on_batch and get_sess_run_infer_args methods of task components.

  • in_y_distribution – The same as in_distribution for "in_y" config parameter.

train_on_batch(*args, **kwargs)Dict[str, Dict[str, float]][source]

Calls train_on_batch methods for every task. This method takes args or kwargs but not both. The order of args is the same as the order of tasks in the component parameters:

args = [
    task1_in_x[0],
    task1_in_x[1],
    task1_in_x[2],
    ...
    task1_in_y[0],
    task1_in_y[1],
    ...
    task2_in_x[0],
    ...
]

If kwargs are used and in_distribution and in_y_distribution attributes are dictionaries of lists of strings, then keys of kwargs have to be same as strings in in_distribution and in_y_distribution. If in_distribution and in_y_distribution are dictionaries of int, then kwargs values are treated the same way as args.

Parameters
  • args – task inputs and expected outputs

  • kwargs – task inputs and expected outputs

Returns

dictionary of dictionaries with task losses and learning rates.

__call__(*args, **kwargs)[source]

Calls one or several BERT heads depending on provided task names. args and kwargs contain inputs of BERT tasks. args and kwargs cannot be used together. If args are used args content has to be

args = [
    task1_in_x[0],
    task1_in_x[1],
    ...
    task2_in_x[0],
    task2_in_x[1],
    ...
]

If kwargs are used and in_distribution is a dictionary of int, then kwargs’ order has to be the same as args order described in the previous paragraph. If in_distribution is a dictionary of lists of str, then all task names from in_distribution have to be present in kwargs keys.

Returns

list of results of called tasks.

call(args: Tuple[Any], kwargs: Dict[str, Any], task_names: Optional[Union[str, List[str]]], in_distribution: Optional[Union[Dict[str, int], Dict[str, List[str]]]] = None)[source]

Calls one or several BERT heads depending on provided task names in task_names parameter. args and kwargs contain inputs of BERT tasks. args and kwargs cannot be used simultaneously. If ``args are used args, content has to be

args = [
    task1_in_x[0],
    task1_in_x[1],
    ...
    task2_in_x[0],
    task2_in_x[1],
    ...
]

If kwargs is used kwargs keys has to match content of in_names params of called tasks.

Parameters
  • args – generally, args parameter of __call__ method of this component or MTBertReUser. Inputs of one or several tasks. Has to be empty if kwargs argument is used.

  • kwargs – generally, kwargs parameter of __call__ method of this component or MTBertReUser. Inputs of one or several tasks. Has to be empty if args argument is used.

  • task_names – names of tasks that are called. If str, then 1 task is called. If a task name is an element of task_names list, then this task is run independently. If task an element of task_names is an list of strings, then tasks in the inner list are run simultaneously.

  • in_distribution – a distribution of variables from "in" config parameters between tasks. For details see method __init__ docstring.

Returns

list results of called tasks.

class deeppavlov.models.multitask_bert.multitask_bert.MTBertTask(keep_prob: float = 1.0, return_probas: Optional[bool] = None, learning_rate: float = 0.001)[source]

Abstract class for multitask BERT tasks. Objects of its subclasses are linked with BERT body when MultiTaskBert.build method is called. Training is performed with MultiTaskBert.train_on_batch method is called. The objects of classes derived from MTBertTask don’t have __call__ method. Instead they have get_sess_run_infer_args and post_process_preds methods, which are called from call method of MultiTaskBert class. get_sess_run_infer_args method returns fetches and feed_dict for inference and post_process_preds method retrieves predictions from computed fetches. Classes derived from MTBertTask must get_sess_run_train_args method that returns fetches and feed_dict for training.

Parameters
  • keep_prob – dropout keep_prob for non-BERT layers

  • return_probas – set this to True if you need the probabilities instead of raw answers

  • learning_rate – learning rate of BERT head

build(bert_body: bert_dp.modeling.BertModel, optimizer_params: Dict[str, Union[str, float]], shared_placeholders: Dict[str, tensorflow.placeholder], sess: tensorflow.Session, mode: str, get_train_op_func: Callable, freeze_embeddings: bool, bert_head_variable_scope: str)None[source]

Initiates building of the BERT head and initializes optimizer parameters, placeholders that are common for all tasks.

Parameters
  • bert_body – instance of BertModel.

  • optimizer_params – a dictionary with four fields: 'optimizer' (str) – a name of optimizer class, 'body_learning_rate' (float) – initial value of BERT body learning rate, 'min_body_learning_rate' (float) – min BERT body learning rate for learning rate decay, 'weight_decay_rate' (float) – L2 weight decay for AdamWeightDecayOptimizer

  • shared_placeholders – a dictionary with placeholders used in all tasks. The dictionary contains fields 'input_ids', 'input_masks', 'learning_rate', 'keep_prob', 'is_train', 'token_types'.

  • sess – current tf.Session instance

  • mode'train' or 'inference'

  • get_train_op_func – a function returning tf.Operation and with signature similar to LRScheduledTFModel.get_train_op without self argument. It is a function returning train operation for specified loss and variable scopes.

  • freeze_embeddings – set False to train input embeddings.

  • bert_head_variable_scope – variable scope for BERT head.

abstract _init_graph()None[source]

Build BERT head, initialize task specific placeholders, create attributes containing output probabilities and model loss. Optimizer initialized not in this method but in _init_optimizer.

get_train_op(loss: tensorflow.Tensor, body_learning_rate: Union[tensorflow.Tensor, float], **kwargs)tensorflow.Operation[source]

Return operation for the task training. Head learning rate is calculated as a product of body_learning_rate and quotient of initial head learning rate and initial body learning rate.

Parameters
  • loss – the task loss

  • body_learning_rate – the learning rate for the BERT body

Returns

train operation for the task

train_on_batch(*args, **kwargs)Dict[str, float][source]

Trains the task on one batch. This method will work correctly if you override get_sess_run_train_args for your task.

Parameters

kwargs – the keys are body_learning_rate and "in" and "in_y" params for the task.

Returns

dictionary with calcutated task loss and body and head learning rates.

abstract get_sess_run_infer_args(*args)Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]

Returns fetches and feed_dict for inference. Fetches are lists of tensors and feed_dict is dictionary with placeholder values required for fetches computation. The method is used inside MultiTaskBert __call__ method.

If self.return_probas is True fetches contains probabilities tensor and predictions tensor otherwise.

Overriding methods take task inputs as positional arguments.

ATTENTION! Let get_sess_run_infer_args method have n_x_args arguments. Then the order of first n_x_args arguments of get_sess_run_train_args method arguments has to match the order of get_sess_run_infer_args arguments.

Parameters

args – task inputs.

Returns

fetches and feed_dict

abstract get_sess_run_train_args(*args)Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]

Returns fetches and feed_dict for task train_on_batch method.

Overriding methods take task inputs as positional arguments.

ATTENTION! Let get_sess_run_infer_args method have n_x_args arguments. Then the order of first n_x_args arguments of get_sess_run_train_args method arguments has to match the order of get_sess_run_infer_args arguments.

Parameters

args – task inputs followed by expect outputs.

Returns

fetches and feed_dict

abstract post_process_preds(sess_run_res: list)list[source]

Post process results of tf.Session.run called for task inference. Called from method MultiTaskBert.__call__.

Parameters

sess_run_res – computed fetches from get_sess_run_infer_args method

Returns

post processed results

class deeppavlov.models.multitask_bert.multitask_bert.MTBertSequenceTaggingTask(n_tags: Optional[int] = None, use_crf: Optional[bool] = None, use_birnn: bool = False, birnn_cell_type: str = 'lstm', birnn_hidden_size: int = 128, keep_prob: float = 1.0, encoder_dropout: float = 0.0, return_probas: Optional[bool] = None, encoder_layer_ids: Optional[List[int]] = None, learning_rate: float = 0.001)[source]

BERT head for text tagging. It predicts a label for every token (not subtoken) in the text. You can use it for sequence labelling tasks, such as morphological tagging or named entity recognition. Objects of this class should be passed to the constructor of MultiTaskBert class in param tasks.

Parameters
  • n_tags – number of distinct tags

  • use_crf – whether to use CRF on top or not

  • use_birnn – whether to use bidirection rnn after BERT layers. For NER and morphological tagging we usually set it to False as otherwise the model overfits

  • birnn_cell_type – the type of Bidirectional RNN. Either "lstm" or "gru"

  • birnn_hidden_size – number of hidden units in the BiRNN layer in each direction

  • keep_prob – dropout keep_prob for non-Bert layers

  • encoder_dropout – dropout probability of encoder output layer

  • return_probas – set this to True if you need the probabilities instead of raw answers

  • encoder_layer_ids – list of averaged layers from Bert encoder (layer ids) optimizer: name of tf.train.* optimizer or None for AdamWeightDecayOptimizer weight_decay_rate: L2 weight decay for AdamWeightDecayOptimizer

  • learning_rate – learning rate of BERT head

get_sess_run_infer_args(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray])Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]

Returns fetches and feed_dict for model inference. The method is called from MultiTaskBert.__call__.

Parameters
  • input_ids – indices of the subwords in vocabulary

  • input_masks – mask that determines where to attend and where not to

  • y_masks – mask which determines the first subword units in the the word

Returns

list of fetches and feed_dict

get_sess_run_train_args(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray], y: Union[List[List[int]], numpy.ndarray], body_learning_rate: float)Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]

Returns fetches and feed_dict for model train_on_batch method.

Parameters
  • input_ids – indices of the subwords in vocabulary

  • input_masks – mask that determines where to attend and where not to

  • y_masks – mask which determines the first subword units in the the word

  • y – indices of ground truth tags

  • body_learning_rate – learning rate for BERT body

Returns

list of fetches and feed_dict

post_process_preds(sess_run_res: List[numpy.ndarray])Union[List[List[int]], List[numpy.ndarray]][source]

Decodes CRF if needed and returns predictions or probabilities.

Parameters

sess_run_res – list of computed fetches gathered by get_sess_run_infer_args

Returns

predictions or probabilities depending on return_probas attribute

class deeppavlov.models.multitask_bert.multitask_bert.MTBertClassificationTask(n_classes: Optional[int] = None, return_probas: Optional[bool] = None, one_hot_labels: Optional[bool] = None, keep_prob: float = 1.0, multilabel: bool = False, learning_rate: float = 2e-05, optimizer: str = 'Adam')[source]

Task for text classification.

It uses output from [CLS] token and predicts labels using linear transformation.

Parameters
  • n_classes – number of classes

  • return_probas – set True if return class probabilities instead of most probable label needed

  • one_hot_labels – set True if one-hot encoding for labels is used

  • keep_prob – dropout keep_prob for non-BERT layers

  • multilabel – set True if it is multi-label classification

  • learning_rate – learning rate of BERT head

  • optimizer – name of tf.train.* optimizer or None for AdamWeightDecayOptimizer

get_sess_run_infer_args(features: List[bert_dp.preprocessing.InputFeatures])Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]

Returns fetches and feed_dict for model inference. The method is called from MultiTaskBert.__call__.

Parameters

features – text features created by BERT preprocessor.

Returns

list of fetches and feed_dict

get_sess_run_train_args(features: List[bert_dp.preprocessing.InputFeatures], y: Union[List[int], List[List[int]]], body_learning_rate: float)Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]

Returns fetches and feed_dict for model train_on_batch method.

Parameters
  • features – text features created by BERT preprocessor.

  • y – batch of labels (class id or one-hot encoding)

  • body_learning_rate – learning rate for BERT body

Returns

list of fetches and feed_dict

post_process_preds(sess_run_res)[source]

Returns tf.Session.run results for inference without changes.

class deeppavlov.models.multitask_bert.multitask_bert.MTBertReUser(mt_bert: deeppavlov.models.multitask_bert.multitask_bert.MultiTaskBert, task_names: Union[str, List[Union[List[str], str]]], in_distribution: Optional[Union[Dict[str, int], Dict[str, List[str]]]] = None, *args, **kwargs)[source]

Instances of this class are for multi-task BERT inference. In inference config MultiTaskBert class may not perform inference of some tasks. For example, you may need to sequentially apply two models with BERT. In that case, mt_bert_reuser is created to call remaining tasks.

Parameters
  • mt_bert – An instance of MultiTaskBert

  • task_names – Names of infered tasks. If task_names is str, then task_names is the name of the only infered task. If task_names is list, then its elements can be either strings or lists of strings. If an element of task_names is a string, then this element is a name of a task that is run independently. If an element of task_names is a list of strings, then the element is a list of names of tasks that have common inputs and run simultaneously. For detailed information look up MultiTaskBert inference_task_names parameter.

__call__(*args, **kwargs)List[Any][source]

Infer tasks listed in parameter task_names. One of parameters args and kwargs has to be empty.

Parameters
  • args – inputs and labels of infered tasks.

  • kwargs – inputs and labels of infered tasks.

Returns

list of results of inference of tasks listed in task_names

class deeppavlov.models.multitask_bert.multitask_bert.InputSplitter(keys_to_extract: Union[List[str], Tuple[str, ]], **kwargs)[source]

The instance of these class in pipe splits a batch of sequences of identical length or dictionaries with identical keys into tuple of batches.

Parameters

keys_to_extract – a sequence of ints or strings that have to match keys of split dictionaries.

__call__(inp: Union[List[dict], List[List[int]], List[Tuple[int]]])List[list][source]

Returns batches of values from inp. Every batch contains values that have same key from keys_to_extract attribute. The order of elements of keys_to_extract is preserved.

Parameters

inp – A sequence of dictionaries with identical keys

Returns

A list of lists of values of dictionaries from inp