Neural Morphological Tagging¶
It is an implementation of neural morphological tagger. The model includes only a dense layer on the top of BERT embedder. See the BERT paper for a more complete description, as well as the BERT section of the documentation.
The model is trained on Universal Dependencies corpora (version 2.3).
Language |
Code |
UDPipe accuracy |
UDPipe Future accuracy |
Our top accuracy |
Model size (MB) |
---|---|---|---|---|---|
Russian (UD2.3) |
ru_syntagrus |
93.5 |
96.90 |
97.83 |
661 |
Usage examples.¶
Before using the model make sure that all required packages are installed using the command:
python -m deeppavlov install morpho_ru_syntagrus_bert
For Windows platform one has to set KERAS_BACKEND to tensorflow (it could be done only once):
set "KERAS_BACKEND=tensorflow"
Python:¶
For Windows platform if one did not set KERAS_BACKEND to tensorflow from command line it could be done in python code in the following way:
import os
os.environ["KERAS_BACKEND"] = "tensorflow"
.. code:: python
from deeppavlov import build_model, configs
model = build_model('morpho_ru_syntagrus_bert', download=True)
sentences = ["Я шёл домой по незнакомой улице.", "Девушка пела в церковном хоре о всех уставших в чужом краю."]
for parse in model(sentences):
print(parse)
::
1 Я я PRON _ Case=Nom|Number=Sing|Person=1 _ _ _ _
2 шёл идти VERB _ Aspect=Imp|Gender=Masc|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act _ _ _ _
3 домой домой ADV _ Degree=Pos _ _ _ _
4 по по ADP _ _ _ _ _ _
5 незнакомой незнакомый ADJ _ Case=Dat|Degree=Pos|Gender=Fem|Number=Sing _ _ _ _
6 улице улица NOUN _ Animacy=Inan|Case=Dat|Gender=Fem|Number=Sing _ _ _ _
7 . . PUNCT _ _ _ _ _ _
1 Девушка девушка NOUN _ Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing _ _ _ _
2 пела петь VERB _ Aspect=Imp|Gender=Fem|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act _ _ _ _
3 в в ADP _ _ _ _ _ _
4 церковном церковный ADJ _ Case=Loc|Degree=Pos|Gender=Masc|Number=Sing _ _ _ _
5 хоре хор NOUN _ Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing _ _ _ _
6 о о ADP _ _ _ _ _ _
7 всех весь PRON _ Animacy=Anim|Case=Loc|Number=Plur _ _ _ _
8 уставших устать VERB _ Aspect=Perf|Case=Loc|Number=Plur|Tense=Past|VerbForm=Part|Voice=Act _ _ _ _
9 в в ADP _ _ _ _ _ _
10 чужом чужой ADJ _ Case=Loc|Degree=Pos|Gender=Masc|Number=Sing _ _ _ _
11 краю край NOUN _ Animacy=Inan|Case=Loc|Gender=Masc|Number=Sing _ _ _ _
12 . . PUNCT _ _ _ _ _ _
You may also pass the tokenized sentences instead of raw ones:
sentences = [["Я", "шёл", "домой", "по", "незнакомой", "улице", "."]]
for parse in model(sentences):
print(parse)
Task description¶
Morphological tagging consists in assigning labels, describing word morphology, to a pre-tokenized sequence of words. In the most simple case these labels are just part-of-speech (POS) tags, hence in earlier times of NLP the task was often referred as POS-tagging. The refined version of the problem which we solve here performs more fine-grained classification, also detecting the values of other morphological features, such as case, gender and number for nouns, mood, tense, etc. for verbs and so on. Morphological tagging is a stage of common NLP pipeline, it generates useful features for further tasks such as syntactic parsing, named entity recognition or machine translation.
Common output for morphological tagging looks as below. The examples are for Russian and English language and use the inventory of tags and features from Universal Dependencies project.
1 Это PRON Animacy=Inan|Case=Acc|Gender=Neut|Number=Sing
2 чутко ADV Degree=Pos
3 фиксируют VERB Aspect=Imp|Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin|Voice=Act
4 энциклопедические ADJ Case=Nom|Degree=Pos|Number=Plur
5 издания NOUN Animacy=Inan|Case=Nom|Gender=Neut|Number=Plur
6 . PUNCT _
1 Four NUM NumType=Card
2 months NOUN Number=Plur
3 later ADV _
4 , PUNCT _
5 we PRON Case=Nom|Number=Plur|Person=1|PronType=Prs
6 were AUX Mood=Ind|Tense=Past|VerbForm=Fin
7 married VERB Tense=Past|VerbForm=Part|Voice=Pass
8 . PUNCT _