deeppavlov.models.multitask_bert¶

class deeppavlov.dataset_readers.multitask_reader.MultiTaskReader[source]¶: Class to read several datasets simultaneuosly

class deeppavlov.dataset_iterators.multitask_iterator.MultiTaskIterator(data: dict, tasks: dict)[source]¶

Class merges data from several dataset iterators. When used for batch generation batches from merged dataset iterators are united into one batch. If sizes of merged datasets are different smaller datasets are repeated until their size becomes equal to the largest dataset.

Parameters

data – dictionary which keys are task names and values are dictionaries with fields "train", "valid", "test".
tasks – dictionary which keys are task names and values are init params of dataset iterators.

data¶: dictionary of data with fields “train”, “valid” and “test” (or some of them)

gen_batches(batch_size: int, data_type: str = 'train', shuffle: Optional[bool] = None) → Iterator[Tuple[tuple, tuple]][source]¶

Generate batches and expected output to train neural networks. Batches from task iterators are united into one batch. Every element of the largest dataset is used once whereas smaller datasets are repeated until their size is equal to the largest dataset.

Parameters

batch_size – number of samples in batch
data_type – can be either ‘train’, ‘test’, or ‘valid’
shuffle – whether to shuffle dataset before batching

Yields

a tuple of a batch of inputs and a batch of expected outputs. Inputs and outputs are tuples. Element of inputs or outputs is a tuple which elements are x values of merged tasks in the order tasks are present in tasks argument of __init__ method.

get_instances(data_type: str = 'train')[source]¶

Returns a tuple of inputs and outputs from all datasets. Lengths of inputs and outputs are equal to the size of the largest dataset. Smaller datasets are repeated until their sizes are equal to the size of the largest dataset.

Parameters: data_type – can be either ‘train’, ‘test’, or ‘valid’
Returns: a tuple of all inputs for a data type and all expected outputs for a data type

class deeppavlov.models.multitask_bert.multitask_bert.MultiTaskBert(*args, **kwargs)[source]¶

The component for multi-task BERT. It builds the BERT body, launches building of BERT heads.

The component aggregates components implementing BERT heads. The head components are called tasks. __call__ and train_on_batch methods of MultiTaskBert are used for inference and training of BERT heads. BERT head components, which are derived from MTBertTask, can be used only inside this class.

One training iteration consists of one train_on_batch call for every task.

If inference_task_names is not None, then the component is created for training. Otherwise, the component is created for inference. If component is created for inference, several tasks can be run simultaneously. For explanation see parameter inference_task_names description.

Parameters

tasks – a dictionary. Task names are dictionary keys and objects of MTBertTask subclasses are dictionary values. Task names are used as variable scopes in computational graph so it is important to use same names in multi-task BERT train and inference configuration files.
bert_config_file – path to BERT configuration file
pretrained_bert – pre-trained BERT checkpoint
attention_probs_keep_prob – keep_prob for BERT self-attention layers
hidden_keep_prob – keep_prob for BERT hidden layers
body_learning_rate – learning rate of BERT body
min_body_learning_rate – min value of body learning rate if learning rate decay is used
learning_rate_drop_patience – how many validations with no improvements to wait
learning_rate_drop_div – the divider of the learning rate after learning_rate_drop_patience unsuccessful validations
load_before_drop – whether to load best model before dropping learning rate or not
clip_norm – clip gradients by norm
freeze_embeddings – set to False to train input embeddings
inference_task_names –
names of tasks on which inference is done. If this parameter is provided, the component is created for inference, else the component is created for training.

If inference_task_names is a string, then it is a name of the task called separately from other tasks (in individual tf.Session.run call).

If inference_task_names is a list, then elements of this list are either strings or lists of strings. You can combine these options. For example, ["task_name1", ["task_name2", "task_name3"], ["task_name4", "task_name5"]].

If an element of inference_task_names list is a string, the element is a name of the task that is computed when __call__ method is called.

If an element of the inference_task_names parameter is a list of strings ["task_name1", "task_name2", ...], then tasks "task_name1", "task_name2" and so on are run simultaneously in tf.Session.run call. This option is available if tasks "task_name1", "task_name2" and so on have common inputs. Despite the fact that tasks share inputs, if positional arguments are used in methods __call__ and train_on_batch, all arguments are passed individually. For instance, if "task_name1", "task_name2", and "task_name3" all take an argument with name x in the model pipe, then the __call__ method takes arguments (x, x, x).
in_distribution –
The distribution of variables listed in the "in" config parameter between tasks. in_distribution can be None if only 1 task is called. In that case all variables listed in "in" are arguments of 1 task.

in_distribution can be a dictionary of int. If that is the case, then keys of in_distribution are task names and values are numbers of variables from "in" parameter of config which are inputs of corresponding task. The variables in "in" parameter have to be in the same order the tasks are listed in in_distribution.

in_distribution can be a dictionary of lists of str. Strings are names of variables from "in" configuration parameter. If "in" parameter is a list, then in_distribution works the same way as when in_distribution is dictionary of int. Values of in_distribution, which are lists, are replaced by their lengths. If "in" parameter in component config is a dictionary, then the order of strings in in_distribution values has to match the order of arguments of train_on_batch and get_sess_run_infer_args methods of task components.
in_y_distribution – The same as in_distribution for "in_y" config parameter.

train_on_batch(*args, **kwargs) → Dict[str, Dict[str, float]][source]¶

Calls train_on_batch methods for every task. This method takes args or kwargs but not both. The order of args is the same as the order of tasks in the component parameters:

args = [
    task1_in_x[0],
    task1_in_x[1],
    task1_in_x[2],
    ...
    task1_in_y[0],
    task1_in_y[1],
    ...
    task2_in_x[0],
    ...
]

If kwargs are used and in_distribution and in_y_distribution attributes are dictionaries of lists of strings, then keys of kwargs have to be same as strings in in_distribution and in_y_distribution. If in_distribution and in_y_distribution are dictionaries of int, then kwargs values are treated the same way as args.

Parameters

args – task inputs and expected outputs
kwargs – task inputs and expected outputs

Returns

dictionary of dictionaries with task losses and learning rates.

__call__(*args, **kwargs)[source]¶

Calls one or several BERT heads depending on provided task names. args and kwargs contain inputs of BERT tasks. args and kwargs cannot be used together. If args are used args content has to be

args = [
    task1_in_x[0],
    task1_in_x[1],
    ...
    task2_in_x[0],
    task2_in_x[1],
    ...
]

If kwargs are used and in_distribution is a dictionary of int, then kwargs’ order has to be the same as args order described in the previous paragraph. If in_distribution is a dictionary of lists of str, then all task names from in_distribution have to be present in kwargs keys.

Returns: list of results of called tasks.

call(args: Tuple[Any], kwargs: Dict[str, Any], task_names: Optional[Union[str, List[str]]], in_distribution: Optional[Union[Dict[str, int], Dict[str, List[str]]]] = None)[source]¶

Calls one or several BERT heads depending on provided task names in task_names parameter. args and kwargs contain inputs of BERT tasks. args and kwargs cannot be used simultaneously. If ``args are used args, content has to be

args = [
    task1_in_x[0],
    task1_in_x[1],
    ...
    task2_in_x[0],
    task2_in_x[1],
    ...
]

If kwargs is used kwargs keys has to match content of in_names params of called tasks.

Parameters

args – generally, args parameter of __call__ method of this component or MTBertReUser. Inputs of one or several tasks. Has to be empty if kwargs argument is used.
kwargs – generally, kwargs parameter of __call__ method of this component or MTBertReUser. Inputs of one or several tasks. Has to be empty if args argument is used.
task_names – names of tasks that are called. If str, then 1 task is called. If a task name is an element of task_names list, then this task is run independently. If task an element of task_names is an list of strings, then tasks in the inner list are run simultaneously.
in_distribution – a distribution of variables from "in" config parameters between tasks. For details see method __init__ docstring.

Returns

list results of called tasks.

class deeppavlov.models.multitask_bert.multitask_bert.MTBertTask(keep_prob: float = 1.0, return_probas: Optional[bool] = None, learning_rate: float = 0.001)[source]¶

Abstract class for multitask BERT tasks. Objects of its subclasses are linked with BERT body when MultiTaskBert.build method is called. Training is performed with MultiTaskBert.train_on_batch method is called. The objects of classes derived from MTBertTask don’t have __call__ method. Instead they have get_sess_run_infer_args and post_process_preds methods, which are called from call method of MultiTaskBert class. get_sess_run_infer_args method returns fetches and feed_dict for inference and post_process_preds method retrieves predictions from computed fetches. Classes derived from MTBertTask must get_sess_run_train_args method that returns fetches and feed_dict for training.

Parameters

keep_prob – dropout keep_prob for non-BERT layers
return_probas – set this to True if you need the probabilities instead of raw answers
learning_rate – learning rate of BERT head

build(bert_body: bert_dp.modeling.BertModel, optimizer_params: Dict[str, Union[str, float]], shared_placeholders: Dict[str, tensorflow.placeholder], sess: tensorflow.Session, mode: str, get_train_op_func: Callable, freeze_embeddings: bool, bert_head_variable_scope: str) → None [source]¶

Initiates building of the BERT head and initializes optimizer parameters, placeholders that are common for all tasks.

Parameters

bert_body – instance of BertModel.
optimizer_params – a dictionary with four fields: 'optimizer' (str) – a name of optimizer class, 'body_learning_rate' (float) – initial value of BERT body learning rate, 'min_body_learning_rate' (float) – min BERT body learning rate for learning rate decay, 'weight_decay_rate' (float) – L2 weight decay for AdamWeightDecayOptimizer
shared_placeholders – a dictionary with placeholders used in all tasks. The dictionary contains fields 'input_ids', 'input_masks', 'learning_rate', 'keep_prob', 'is_train', 'token_types'.
sess – current tf.Session instance
mode – 'train' or 'inference'
get_train_op_func – a function returning tf.Operation and with signature similar to LRScheduledTFModel.get_train_op without self argument. It is a function returning train operation for specified loss and variable scopes.
freeze_embeddings – set False to train input embeddings.
bert_head_variable_scope – variable scope for BERT head.

abstract _init_graph() → None [source]¶: Build BERT head, initialize task specific placeholders, create attributes containing output probabilities and model loss. Optimizer initialized not in this method but in _init_optimizer.

get_train_op(loss: tensorflow.Tensor, body_learning_rate: Union[tensorflow.Tensor, float], **kwargs) → tensorflow.Operation[source]¶

Return operation for the task training. Head learning rate is calculated as a product of body_learning_rate and quotient of initial head learning rate and initial body learning rate.

Parameters

loss – the task loss
body_learning_rate – the learning rate for the BERT body

Returns

train operation for the task

train_on_batch(*args, **kwargs) → Dict[str, float][source]¶

Trains the task on one batch. This method will work correctly if you override get_sess_run_train_args for your task.

Parameters: kwargs – the keys are body_learning_rate and "in" and "in_y" params for the task.
Returns: dictionary with calcutated task loss and body and head learning rates.

abstract get_sess_run_infer_args(*args) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶

Returns fetches and feed_dict for inference. Fetches are lists of tensors and feed_dict is dictionary with placeholder values required for fetches computation. The method is used inside MultiTaskBert __call__ method.

If self.return_probas is True fetches contains probabilities tensor and predictions tensor otherwise.

Overriding methods take task inputs as positional arguments.

ATTENTION! Let get_sess_run_infer_args method have n_x_args arguments. Then the order of first n_x_args arguments of get_sess_run_train_args method arguments has to match the order of get_sess_run_infer_args arguments.

Parameters: args – task inputs.
Returns: fetches and feed_dict

abstract get_sess_run_train_args(*args) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶

Returns fetches and feed_dict for task train_on_batch method.

Overriding methods take task inputs as positional arguments.

ATTENTION! Let get_sess_run_infer_args method have n_x_args arguments. Then the order of first n_x_args arguments of get_sess_run_train_args method arguments has to match the order of get_sess_run_infer_args arguments.

Parameters: args – task inputs followed by expect outputs.
Returns: fetches and feed_dict

abstract post_process_preds(sess_run_res: list) → list [source]¶

Post process results of tf.Session.run called for task inference. Called from method MultiTaskBert.__call__.

Parameters: sess_run_res – computed fetches from get_sess_run_infer_args method
Returns: post processed results

class deeppavlov.models.multitask_bert.multitask_bert.MTBertSequenceTaggingTask(n_tags: Optional[int] = None, use_crf: Optional[bool] = None, use_birnn: bool = False, birnn_cell_type: str = 'lstm', birnn_hidden_size: int = 128, keep_prob: float = 1.0, encoder_dropout: float = 0.0, return_probas: Optional[bool] = None, encoder_layer_ids: Optional[List[int]] = None, learning_rate: float = 0.001)[source]¶

BERT head for text tagging. It predicts a label for every token (not subtoken) in the text. You can use it for sequence labelling tasks, such as morphological tagging or named entity recognition. Objects of this class should be passed to the constructor of MultiTaskBert class in param tasks.

Parameters

n_tags – number of distinct tags
use_crf – whether to use CRF on top or not
use_birnn – whether to use bidirection rnn after BERT layers. For NER and morphological tagging we usually set it to False as otherwise the model overfits
birnn_cell_type – the type of Bidirectional RNN. Either "lstm" or "gru"
birnn_hidden_size – number of hidden units in the BiRNN layer in each direction
keep_prob – dropout keep_prob for non-Bert layers
encoder_dropout – dropout probability of encoder output layer
return_probas – set this to True if you need the probabilities instead of raw answers
encoder_layer_ids – list of averaged layers from Bert encoder (layer ids) optimizer: name of tf.train.* optimizer or None for AdamWeightDecayOptimizer weight_decay_rate: L2 weight decay for AdamWeightDecayOptimizer
learning_rate – learning rate of BERT head

get_sess_run_infer_args(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray]) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶

Returns fetches and feed_dict for model inference. The method is called from MultiTaskBert.__call__.

Parameters

input_ids – indices of the subwords in vocabulary
input_masks – mask that determines where to attend and where not to
y_masks – mask which determines the first subword units in the the word

Returns

list of fetches and feed_dict

get_sess_run_train_args(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray], y: Union[List[List[int]], numpy.ndarray], body_learning_rate: float) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶

Returns fetches and feed_dict for model train_on_batch method.

Parameters

input_ids – indices of the subwords in vocabulary
input_masks – mask that determines where to attend and where not to
y_masks – mask which determines the first subword units in the the word
y – indices of ground truth tags
body_learning_rate – learning rate for BERT body

Returns

list of fetches and feed_dict

post_process_preds(sess_run_res: List[numpy.ndarray]) → Union[List[List[int]], List[numpy.ndarray]][source]¶

Decodes CRF if needed and returns predictions or probabilities.

Parameters: sess_run_res – list of computed fetches gathered by get_sess_run_infer_args
Returns: predictions or probabilities depending on return_probas attribute

class deeppavlov.models.multitask_bert.multitask_bert.MTBertClassificationTask(n_classes: Optional[int] = None, return_probas: Optional[bool] = None, one_hot_labels: Optional[bool] = None, keep_prob: float = 1.0, multilabel: bool = False, learning_rate: float = 2e-05, optimizer: str = 'Adam')[source]¶

Task for text classification.

It uses output from [CLS] token and predicts labels using linear transformation.

Parameters

n_classes – number of classes
return_probas – set True if return class probabilities instead of most probable label needed
one_hot_labels – set True if one-hot encoding for labels is used
keep_prob – dropout keep_prob for non-BERT layers
multilabel – set True if it is multi-label classification
learning_rate – learning rate of BERT head
optimizer – name of tf.train.* optimizer or None for AdamWeightDecayOptimizer

get_sess_run_infer_args(features: List[bert_dp.preprocessing.InputFeatures]) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶

Returns fetches and feed_dict for model inference. The method is called from MultiTaskBert.__call__.

Parameters: features – text features created by BERT preprocessor.
Returns: list of fetches and feed_dict

get_sess_run_train_args(features: List[bert_dp.preprocessing.InputFeatures], y: Union[List[int], List[List[int]]], body_learning_rate: float) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶

Returns fetches and feed_dict for model train_on_batch method.

Parameters

features – text features created by BERT preprocessor.
y – batch of labels (class id or one-hot encoding)
body_learning_rate – learning rate for BERT body

Returns

list of fetches and feed_dict

post_process_preds(sess_run_res)[source]¶: Returns tf.Session.run results for inference without changes.

class deeppavlov.models.multitask_bert.multitask_bert.MTBertReUser(mt_bert: deeppavlov.models.multitask_bert.multitask_bert.MultiTaskBert, task_names: Union[str, List[Union[List[str], str]]], in_distribution: Optional[Union[Dict[str, int], Dict[str, List[str]]]] = None, *args, **kwargs)[source]¶

Instances of this class are for multi-task BERT inference. In inference config MultiTaskBert class may not perform inference of some tasks. For example, you may need to sequentially apply two models with BERT. In that case, mt_bert_reuser is created to call remaining tasks.

Parameters

mt_bert – An instance of MultiTaskBert
task_names – Names of infered tasks. If task_names is str, then task_names is the name of the only infered task. If task_names is list, then its elements can be either strings or lists of strings. If an element of task_names is a string, then this element is a name of a task that is run independently. If an element of task_names is a list of strings, then the element is a list of names of tasks that have common inputs and run simultaneously. For detailed information look up MultiTaskBert inference_task_names parameter.

__call__(*args, **kwargs) → List[Any][source]¶

Infer tasks listed in parameter task_names. One of parameters args and kwargs has to be empty.

Parameters

args – inputs and labels of infered tasks.
kwargs – inputs and labels of infered tasks.

Returns

list of results of inference of tasks listed in task_names

class deeppavlov.models.multitask_bert.multitask_bert.InputSplitter(keys_to_extract: Union[List[str], Tuple[str, …]], **kwargs)[source]¶

The instance of these class in pipe splits a batch of sequences of identical length or dictionaries with identical keys into tuple of batches.

Parameters: keys_to_extract – a sequence of ints or strings that have to match keys of split dictionaries.

__call__(inp: Union[List[dict], List[List[int]], List[Tuple[int]]]) → List[list][source]¶

Returns batches of values from inp. Every batch contains values that have same key from keys_to_extract attribute. The order of elements of keys_to_extract is preserved.

Parameters: inp – A sequence of dictionaries with identical keys
Returns: A list of lists of values of dictionaries from inp