Morphotagger

Open In Colab

1. Introduction to the task

Morphological tagging is definition morphological tags, such as case, number, gender, aspect etc. for text tokens.

An example:

Я шёл домой по незнакомой улице.
1   Я   я   PRON    _   Case=Nom|Number=Sing|Person=1   _   _   _   _
2   шёл идти    VERB    _   Aspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act   _   _   _   _
3   домой   домой   ADV _   Degree=Pos  _   _   _   _
4   по  по  ADP _   _   _   _   _   _
5   незнакомой  незнакомый  ADJ _   Case=Dat|Degree=Pos|Gender=Fem|Number=Sing  _   _   _   _
6   улице   улица   NOUN    _   Animacy=Inan|Case=Dat|Gender=Fem|Number=Sing    _   _   _   _
7   .   .   PUNCT   _   _   _   _   _   _

The model is based on BERT for token classification. The model is trained on Universal Dependencies corpora (version 2.3).

2. Get started with the model

First make sure you have the DeepPavlov Library installed. More info about the first installation.

[ ]:
!pip install -q deeppavlov

Before using the model make sure that all required packages are installed running the command:

[ ]:
!python -m deeppavlov install morpho_ru_syntagrus_bert

3. Models list

The table presents comparison of morpho_ru_syntagrus_bert config with other models on UD2.3 dataset.

Model

Accuracy

UDPipe

93.5

morpho_ru_syntagrus_bert

97.6

4. Use the model for prediction

4.1 Predict using Python

[ ]:
from deeppavlov import build_model

model = build_model("morpho_ru_syntagrus_bert", download=True, install=True)
[ ]:
sentences = ["Я шёл домой по незнакомой улице.", "Девушка пела в церковном хоре о всех уставших в чужом краю."]
for parse in model(sentences):
    print(parse)
1       Я       я       PRON    _       Case=Nom|Number=Sing|Person=1   _       _       _       _
2       шёл     шёл     VERB    _       Aspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act   _       _       _       _
3       домой   домой   ADV     _       Degree=Pos      _       _       _       _
4       по      по      ADP     _       _       _       _       _       _
5       незнакомой      незнакомый      ADJ     _       Case=Dat|Degree=Pos|Gender=Fem|Number=Sing      _       _       _       _
6       улице   улица   NOUN    _       Animacy=Inan|Case=Dat|Gender=Fem|Number=Sing    _       _       _       _
7       .       .       PUNCT   _       _       _       _       _       _

1       Девушка девушка NOUN    _       Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing    _       _       _       _
2       пела    петь    VERB    _       Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act    _       _       _       _
3       в       в       ADP     _       _       _       _       _       _
4       церковном       церковном       ADJ     _       Case=Loc|Degree=Pos|Gender=Masc|Number=Sing     _       _       _       _
5       хоре    хор     NOUN    _       Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing   _       _       _       _
6       о       о       ADP     _       _       _       _       _       _
7       всех    весь    DET     _       Case=Loc|Number=Plur    _       _       _       _
8       уставших        устать  VERB    _       Aspect=Perf|Case=Loc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act     _       _       _       _
9       в       в       ADP     _       _       _       _       _       _
10      чужом   чужом   ADJ     _       Case=Loc|Degree=Pos|Gender=Masc|Number=Sing     _       _       _       _
11      краю    край    NOUN    _       Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing   _       _       _       _
12      .       .       PUNCT   _       _       _       _       _       _

4.2 Predict using CLI

You can also get predictions in an interactive mode through CLI (Сommand Line Interface).

[ ]:
! python -m deeppavlov interact morpho_ru_syntagrus_bert -d

-d is an optional download key (alternative to download=True in Python code). The key -d is used to download the pre-trained model along with embeddings and all other files needed to run the model.

5. Customize the model

To train morphotagger on your own data, you should prepare a dataset in CoNLL-U format. The description of CoNLL-U format can be found here.

Then you should place files for training, validation and testing into the "data_path" directory of morphotagger_dataset_reader, change file names in morphotagger_dataset_reader to your filenames and launch the training:

[ ]:
from deeppavlov import train_model

train_model("<your_morphotagging_config_name>")

or using CLI:

[ ]:
! python -m deeppavlov train <your_morphotagging_config_name>