deeppavlov.models.morpho_tagger¶
-
class
deeppavlov.models.morpho_tagger.tagger.
MorphoTaggerWrapper
(save_path: str = None, load_path: str = None, mode: str = None, **kwargs)[source]¶ A wrapper over morphological tagger, implemented in :class:~deeppavlov.models.morpho_tagger.network.CharacterTagger. A subclass of
NNModel
Parameters: - save_path – the path where model is saved
- load_path – the path from where model is loaded
- mode – usage mode
- **kwargs – a dictionary containing model parameters specified in the main part of json config that corresponds to the model
-
__call__
(*x_batch, **kwargs)[source]¶ Predicts answers on batch elements.
Parameters: instance – a batch to predict answers on
-
deeppavlov.models.morpho_tagger.common.
predict_with_model
(config_path: [<class 'pathlib.Path'>, <class 'str'>]) → List[List[str]][source]¶ Returns predictions of morphotagging model given in config :config_path:.
Parameters: config_path – a path to config Returns: a list of morphological analyses for each sentence. Each analysis is either a list of tags or a list of full CONLL-U descriptions.
-
class
deeppavlov.models.morpho_tagger.network.
CharacterTagger
(symbols: deeppavlov.core.data.vocab.DefaultVocabulary, tags: deeppavlov.core.data.vocab.DefaultVocabulary, word_rnn: str = 'cnn', char_embeddings_size: int = 16, char_conv_layers: int = 1, char_window_size: Union[int, List[int]] = 5, char_filters: Union[int, List[int]] = None, char_filter_multiple: int = 25, char_highway_layers: int = 1, conv_dropout: float = 0.0, highway_dropout: float = 0.0, intermediate_dropout: float = 0.0, lstm_dropout: float = 0.0, word_vectorizers: List[Tuple[int, int]] = None, word_lstm_layers: int = 1, word_lstm_units: Union[int, List[int]] = 128, word_dropout: float = 0.0, regularizer: float = None, verbose: int = 1)[source]¶ A class for character-based neural morphological tagger
Parameters: - symbols – character vocabulary
- tags – morphological tags vocabulary
- word_rnn – the type of character-level network (only cnn implemented)
- char_embeddings_size – the size of character embeddings
- char_conv_layers – the number of convolutional layers on character level
- char_window_size – the width of convolutional filter (filters)
- char_filters – the number of convolutional filters for each window width
- char_filter_multiple – the ratio between filters number and window width
- char_highway_layers – the number of highway layers on character level
- conv_dropout – the ratio of dropout between convolutional layers
- highway_dropout – the ratio of dropout between highway layers,
- intermediate_dropout – the ratio of dropout between convolutional and highway layers on character level
- lstm_dropout – dropout ratio in word-level LSTM
- word_vectorizers – list of parameters for additional word-level vectorizers, for each vectorizer it stores a pair of vectorizer dimension and the dimension of the corresponding word embedding
- word_lstm_layers – the number of word-level LSTM layers
- word_lstm_units – hidden dimensions of word-level LSTMs
- word_dropout – the ratio of dropout before word level (it is applied to word embeddings)
- regularizer – l2 regularization parameter
- verbose – the level of verbosity
-
load
(infile)[source]¶ Loads model weights from a file
Parameters: infile – file to load model weights from
-
predict_on_batch
(data: Union[list, tuple], return_indexes: bool = False) → List[List[str]][source]¶ Makes predictions on a single batch
Parameters: - data – a batch of word sequences together with additional inputs
- return_indexes – whether to return tag indexes in vocabulary or tags themselves
Returns: a batch of label sequences
-
save
(outfile)[source]¶ Saves model weights to a file
Parameters: outfile – file with model weights (other model components should be given in config)
-
symbols_number_
¶ Character vocabulary size
Tag vocabulary size
-
deeppavlov.models.morpho_tagger.common.
prettify
(sent: Union[str, List[str]], tags: List[str], return_string: bool = True, begin: str = '', end: str = '', sep: str = '\n') → Union[List[str], str][source]¶ Prettifies output of morphological tagger.
Parameters: - sent – source sentence (either tokenized or not)
- tags – list of tags, the output of a tagger
- return_string – whether to return a list of strings or a single string
- begin – a string to append in the beginning
- end – a string to append in the end
- sep – separator between word analyses
Returns: the prettified output of the tagger.
Examples
>>> sent = "John likes, really likes pizza" >>> tags = ["NNP", "VBZ", "PUNCT", "RB", "VBZ", "NN"] >>> prettify(sent, tags) 1 John NNP 2 likes VBZ 3 , PUNCT 4 really RB 5 likes VBZ 6 pizza NN 7 . SENT
-
class
deeppavlov.models.morpho_tagger.common.
TagOutputPrettifier
(return_string: bool = True, begin: str = '', end: str = '', sep: str = 'n', **kwargs)[source]¶ Wrapper to
()
function.Parameters: - return_string – whether to return a list of strings or a single string
- begin – a string to append in the beginning
- end – a string to append in the end
- sep – separator between word analyses
-
__call__
(X: List[Union[List[str], str]], Y: List[Union[List[str], str]]) → List[Union[List[str], str]][source]¶ Calls the
prettify
function for each input sentence.Parameters: - X – a list of input sentences
- Y – a list of list of tags for sentence words
Returns: a list of prettified morphological analyses