deeppavlov.models.morpho_tagger

class deeppavlov.models.morpho_tagger.tagger.MorphoTaggerWrapper(save_path: str = None, load_path: str = None, mode: str = None, **kwargs)[source]

A wrapper over morphological tagger, implemented in :class:~deeppavlov.models.morpho_tagger.network.CharacterTagger. A subclass of NNModel

Parameters:
  • save_path – the path where model is saved
  • load_path – the path from where model is loaded
  • mode – usage mode
  • **kwargs – a dictionary containing model parameters specified in the main part of json config that corresponds to the model
__call__(*x_batch, **kwargs)[source]

Predicts answers on batch elements.

Parameters:instance – a batch to predict answers on
load()[source]

Checks existence of the model file, loads the model if the file exists

save()[source]

Saves model to the save_path, provided in config. The directory is already created by super().__init__, which is called in __init__ of this class

train_on_batch(*args)[source]

Trains the model on a single batch.

Parameters:
  • *args – the list of network inputs.
  • element of args is the batch of targets, (Last) –
  • previous elements are training data batches (all) –
deeppavlov.models.morpho_tagger.common.predict_with_model(config_path: [<class 'pathlib.Path'>, <class 'str'>]) → List[List[str]][source]

Returns predictions of morphotagging model given in config :config_path:.

Parameters:config_path – a path to config
Returns:a list of morphological analyses for each sentence. Each analysis is either a list of tags or a list of full CONLL-U descriptions.
class deeppavlov.models.morpho_tagger.network.CharacterTagger(symbols: deeppavlov.core.data.vocab.DefaultVocabulary, tags: deeppavlov.core.data.vocab.DefaultVocabulary, word_rnn: str = 'cnn', char_embeddings_size: int = 16, char_conv_layers: int = 1, char_window_size: Union[int, List[int]] = 5, char_filters: Union[int, List[int]] = None, char_filter_multiple: int = 25, char_highway_layers: int = 1, conv_dropout: float = 0.0, highway_dropout: float = 0.0, intermediate_dropout: float = 0.0, lstm_dropout: float = 0.0, word_vectorizers: List[Tuple[int, int]] = None, word_lstm_layers: int = 1, word_lstm_units: Union[int, List[int]] = 128, word_dropout: float = 0.0, regularizer: float = None, verbose: int = 1)[source]

A class for character-based neural morphological tagger

Parameters:
  • symbols – character vocabulary
  • tags – morphological tags vocabulary
  • word_rnn – the type of character-level network (only cnn implemented)
  • char_embeddings_size – the size of character embeddings
  • char_conv_layers – the number of convolutional layers on character level
  • char_window_size – the width of convolutional filter (filters)
  • char_filters – the number of convolutional filters for each window width
  • char_filter_multiple – the ratio between filters number and window width
  • char_highway_layers – the number of highway layers on character level
  • conv_dropout – the ratio of dropout between convolutional layers
  • highway_dropout – the ratio of dropout between highway layers,
  • intermediate_dropout – the ratio of dropout between convolutional and highway layers on character level
  • lstm_dropout – dropout ratio in word-level LSTM
  • word_vectorizers – list of parameters for additional word-level vectorizers, for each vectorizer it stores a pair of vectorizer dimension and the dimension of the corresponding word embedding
  • word_lstm_layers – the number of word-level LSTM layers
  • word_lstm_units – hidden dimensions of word-level LSTMs
  • word_dropout – the ratio of dropout before word level (it is applied to word embeddings)
  • regularizer – l2 regularization parameter
  • verbose – the level of verbosity
build()[source]

Builds the network using Keras.

load(infile)[source]

Loads model weights from a file

Parameters:infile – file to load model weights from
predict_on_batch(data: Union[list, tuple], return_indexes: bool = False) → List[List[str]][source]

Makes predictions on a single batch

Parameters:
  • data – a batch of word sequences together with additional inputs
  • return_indexes – whether to return tag indexes in vocabulary or tags themselves
Returns:

a batch of label sequences

save(outfile)[source]

Saves model weights to a file

Parameters:outfile – file with model weights (other model components should be given in config)
symbols_number_

Character vocabulary size

tags_number_

Tag vocabulary size

train_on_batch(data: List[Iterable], labels: Iterable[list])[source]

Trains model on a single batch

Parameters:
  • data – a batch of word sequences
  • labels – a batch of correct tag sequences
Returns:

the trained model

deeppavlov.models.morpho_tagger.common.prettify(sent: Union[str, List[str]], tags: List[str], return_string: bool = True, begin: str = '', end: str = '', sep: str = '\n') → Union[List[str], str][source]

Prettifies output of morphological tagger.

Parameters:
  • sent – source sentence (either tokenized or not)
  • tags – list of tags, the output of a tagger
  • return_string – whether to return a list of strings or a single string
  • begin – a string to append in the beginning
  • end – a string to append in the end
  • sep – separator between word analyses
Returns:

the prettified output of the tagger.

Examples

>>> sent = "John likes, really likes pizza"
>>> tags = ["NNP", "VBZ", "PUNCT", "RB", "VBZ", "NN"]
>>> prettify(sent, tags)
1  John    NNP
2  likes   VBZ
3  ,   PUNCT
4  really  RB
5  likes   VBZ
6  pizza   NN
7  .    SENT
class deeppavlov.models.morpho_tagger.common.TagOutputPrettifier(return_string: bool = True, begin: str = '', end: str = '', sep: str = 'n', **kwargs)[source]

Wrapper to () function.

Parameters:
  • return_string – whether to return a list of strings or a single string
  • begin – a string to append in the beginning
  • end – a string to append in the end
  • sep – separator between word analyses
__call__(X: List[Union[List[str], str]], Y: List[Union[List[str], str]]) → List[Union[List[str], str]][source]

Calls the prettify function for each input sentence.

Parameters:
  • X – a list of input sentences
  • Y – a list of list of tags for sentence words
Returns:

a list of prettified morphological analyses