deeppavlov.models.syntax_parser

class deeppavlov.models.syntax_parser.network.BertSyntaxParser(*args, **kwargs)[source]

BERT-based model for syntax parsing. For each word the model predicts the index of its syntactic head and the label of the dependency between this head and the current word. See deeppavlov.models.bert.bert_sequence_tagger.BertSequenceNetwork for the description of inherited parameters.

Parameters
  • n_deps – number of distinct syntactic dependencies

  • embeddings_dropout – dropout for embeddings in biaffine layer

  • state_size – the size of hidden state in biaffine layer

  • dep_state_size – the size of hidden state in biaffine layer

  • use_birnn – whether to use bidirection rnn after BERT layers. Set it to True as it leads to much higher performance at least on large datasets

  • birnn_cell_type – the type of Bidirectional RNN. Either lstm or gru

  • birnn_hidden_size – number of hidden units in the BiRNN layer in each direction

  • return_probas – set this to True if you need the probabilities instead of raw answers

  • tags (predict) – whether to predict morphological tags together with syntactic information

  • n_tags – the number of morphological tags

  • tag_weight – the weight of tag model loss in multitask training

__call__(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray])Union[Tuple[List[Union[List[int], numpy.ndarray]], List[List[int]]], Tuple[List[Union[List[int], numpy.ndarray]], List[List[int]], List[List[int]]]][source]

Predicts the outputs for a batch of inputs. By default (return_probas = False and predict_tags = False) it returns two output batches. The first is the batch of head indexes: i stands for i-th word in the sequence, where numeration starts with 1. 0 is predicted for the syntactic root of the sentence. The second is the batch of indexes for syntactic dependencies. In case return_probas = True we return the probability distribution over possible heads instead of the position of the most probable head. For a sentence of length k the output is an array of shape k * (k+1). In case predict_tags = True the model additionally returns the index of the most probable morphological tag for each word. The batch of such indexes becomes the third output of the function.

Returns

pred_heads_to_return, either a batch of most probable head positions for each token (in case return_probas = False) or a batch of probability distribution over token head positions

pred_deps, the indexes of token dependency relations

pred_tags: the indexes of token morphological tags (only if predict_tags = True)

deeppavlov.models.syntax_parser.network.gather_indexes(A: tensorflow.Tensor, B: tensorflow.Tensor)tensorflow.Tensor[source]
Parameters
  • A – a tensor with data

  • B – an integer tensor with indexes

Returns

answer a tensor such that answer[i, j] = A[i, B[i, j]]. In case B is one-dimensional, the output is answer[i] = A[i, B[i]]

deeppavlov.models.syntax_parser.network.biaffine_layer(deps: tensorflow.Tensor, heads: tensorflow.Tensor, deps_dim: int, heads_dim: int, output_dim: int, name: str = 'biaffine_layer')tensorflow.Tensor[source]

Implements a biaffine layer from [Dozat, Manning, 2016].

Parameters
  • deps – the 3D-tensor of dependency states,

  • heads – the 3D-tensor of head states,

  • deps_dim – the dimension of dependency states,

  • heads_dim – the dimension of head_states,

  • output_dim – the output dimension

  • name – the name of a layer

Returns

answer the output 3D-tensor

deeppavlov.models.syntax_parser.network.biaffine_attention(deps: tensorflow.Tensor, heads: tensorflow.Tensor, name='biaffine_attention')tensorflow.Tensor[source]

Implements a trainable matching layer between two families of embeddings.

Parameters
  • deps – the 3D-tensor of dependency states,

  • heads – the 3D-tensor of head states,

  • name – the name of a layer

Returns

answer a 3D-tensor of pairwise scores between deps and heads

class deeppavlov.models.syntax_parser.joint.JointTaggerParser(tagger: deeppavlov.core.common.chainer.Chainer, parser: deeppavlov.core.common.chainer.Chainer, output_format: str = 'ud', to_output_string: bool = False, *args, **kwargs)[source]

A class to perform joint morphological and syntactic parsing. It is just a wrapper that calls the models for tagging and parsing and comprises their results in a single output.

Parameters
  • tagger – the morphological tagger model (a Chainer instance)

  • parser_path – the syntactic parser model (a Chainer instance)

  • output_format – the output format, it may be either ud (alias: conllu) or json.

  • to_output_string – whether to convert the output to a list of strings

tagger

a morphological tagger model (a Chainer instance)

parser

a syntactic parser model (a Chainer instance)

__call__(data: Union[List[str], List[List[str]]])Union[List[List[dict]], List[str], List[List[str]]][source]

Parses a batch of sentences.

Parameters

data – either a batch of tokenized sentences, or a batch of raw sentences

Returns

answer, a batch of parsed sentences. A sentence parse is a list of single word parses. Each word parse is either a CoNLL-U-formatted string or a dictionary. A sentence parse is returned either as is if self.to_output_string is False, or as a single string, where each word parse begins with a new string.

>>> from deeppavlov.core.commands.infer import build_model
>>> model = build_model("ru_syntagrus_joint_parsing")
>>> batch = ["Девушка пела в церковном хоре.", "У этой задачи есть сложное решение."]
>>> print(*model(batch), sep="\\n\\n")
    1       Девушка девушка NOUN    _       Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing    2       nsubj   _       _
    2       пела    петь    VERB    _       Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act    0       root    _       _
    3       в       в       ADP     _       _       5       case    _       _
    4       церковном       церковный       ADJ     _       Case=Loc|Degree=Pos|Gender=Masc|Number=Sing     5       amod    _       _
    5       хоре    хор     NOUN    _       Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing   2       obl     _       _
    6       .       .       PUNCT   _       _       2       punct   _       _

    1       У       у       ADP     _       _       3       case    _       _
    2       этой    этот    DET     _       Case=Gen|Gender=Fem|Number=Sing 3       det     _       _
    3       задачи  задача  NOUN    _       Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing    4       obl     _       _
    4       есть    быть    VERB    _       Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act      0       root    _       _
    5       сложное сложный ADJ     _       Case=Nom|Degree=Pos|Gender=Neut|Number=Sing     6       amod    _       _
    6       решение решение NOUN    _       Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing   4       nsubj   _       _
    7       .       .       PUNCT   _       _       4       punct   _       _

>>> # Dirty hacks to change model parameters in the code, you should do it in the configuration file.
>>> model["main"].to_output_string = False
>>> model["main"].output_format = "json"
>>> for sent_parse in model(batch):
>>>     for word_parse in sent_parse:
>>>         print(word_parse)
>>>     print("")
    {'id': '1', 'word': 'Девушка', 'lemma': 'девушка', 'upos': 'NOUN', 'feats': 'Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing', 'head': '2', 'deprel': 'nsubj'}
    {'id': '2', 'word': 'пела', 'lemma': 'петь', 'upos': 'VERB', 'feats': 'Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act', 'head':                   '0', 'deprel': 'root'}
    {'id': '3', 'word': 'в', 'lemma': 'в', 'upos': 'ADP', 'feats': '_', 'head': '5', 'deprel': 'case'}
    {'id': '4', 'word': 'церковном', 'lemma': 'церковный', 'upos': 'ADJ', 'feats': 'Case=Loc|Degree=Pos|Gender=Masc|Number=Sing', 'head': '5', 'deprel': 'amod'}
    {'id': '5', 'word': 'хоре', 'lemma': 'хор', 'upos': 'NOUN', 'feats': 'Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing', 'head': '2', 'deprel': 'obl'}
    {'id': '6', 'word': '.', 'lemma': '.', 'upos': 'PUNCT', 'feats': '_', 'head': '2', 'deprel': 'punct'}

    {'id': '1', 'word': 'У', 'lemma': 'у', 'upos': 'ADP', 'feats': '_', 'head': '3', 'deprel': 'case'}
    {'id': '2', 'word': 'этой', 'lemma': 'этот', 'upos': 'DET', 'feats': 'Case=Gen|Gender=Fem|Number=Sing', 'head': '3', 'deprel': 'det'}
    {'id': '3', 'word': 'задачи', 'lemma': 'задача', 'upos': 'NOUN', 'feats': 'Animacy=Inan|Case=Gen|Gender=Fem|Number=Sing', 'head': '4', 'deprel': 'obl'}
    {'id': '4', 'word': 'есть', 'lemma': 'быть', 'upos': 'VERB', 'feats': 'Aspect=Imp|Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act', 'head': '0',                'deprel': 'root'}
    {'id': '5', 'word': 'сложное', 'lemma': 'сложный', 'upos': 'ADJ', 'feats': 'Case=Nom|Degree=Pos|Gender=Neut|Number=Sing', 'head': '6', 'deprel': 'amod'}
    {'id': '6', 'word': 'решение', 'lemma': 'решение', 'upos': 'NOUN', 'feats': 'Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing', 'head': '4', 'deprel': 'nsubj'}
    {'id': '7', 'word': '.', 'lemma': '.', 'upos': 'PUNCT', 'feats': '_', 'head': '4', 'deprel': 'punct'}