Russian SuperGLUE Submission¶
The DeepPavlov library provides a way to train your Russian SuperGLUE models and submit the results to the leaderboard in a couple of easy steps.
Task definition¶
Russian SuperGLUE is a benchmark that contains a set of tasks in Russian developed for evaluating general language understanding.
There are 9 tasks in the Russian SuperGLUE set:
DaNetQA (Yes/no Question Answering Dataset for Russian) is a binary classification task of question answering, in which the model is asked to answer a yes/no question based on a given context fragment.
PARus (Choice of Plausible Alternatives for Russian language) is a causal reasoning task. The model is asked to choose the most plausible alternative that has causal relation with the given premise.
RCB (Russian Commitment Bank) is a classification task in which the model is asked to define the type of textual entailment (Entailment, Contradiction, Neutral) between two sentences.
In the MuSeRC (Russian Multi-Sentence Reading Comprehension) task the model needs to process information from multiple sentences at once and identify the correct answers for the question from the given list.
In the RuCoS (Russian reading comprehension with Commonsense reasoning) task the model has to choose the answer to each query from a list of text spans from a fragment.
RUSSE (Russian Word-in-Context) is a reading comprehension task in which the model has to identify whether a given word is used in the same meaning in two different sentences.
In RWSD (The Russian Winograd Schema Challenge) the data is a set of sentences that differ by one or two words in which syntactic ambiguity is resolved differently. The model is trained to predict whether it is resolved correctly.
LiDiRus is a diagnostic task in which the model has to identify whether there is entailment between two sentences.
TERRa (Textual Entailment Recognition for Russian) is a binary classification task of identifying whether there is entailment between two sentences.
For more detailed description of each task see this.
Train your model¶
Modify the configuration file you need and train your own model for the task (see here for more detailed instructions). The full list of models designed for each task can be found in the table below.
Create your submission files¶
To do that, use the submit
command with the name of the configuration file that defines the path to your model.
Note that the name of the Russian SuperGLUE task should be defined in the ["metadata"]["variables"]["TASK"]
variable in the config file.
python -m deeppavlov.utils.benchmarks.superglue <config_name> [-d] [-o <output_file_name.jsonl>]
-d
: downloads model specific data before starting submission generation.-o <output_file_name.jsonl>
: set output file name. By default for Russian SuperGLUE models output filenames are comply with benchmark requirements.
For example, russian_superglue_danetqa_rubert
solves Yes/no Question Answering Dataset for the Russian task.
Following command will generate DaNetQA.jsonl
ready for submission:
python -m deeppavlov.utils.benchmarks.superglue russian_superglue_danetqa_rubert -d
The prediction results will be saved in the correct format and the file will be automatically named with the name required by the system and saved to the current directory. All you have to do next is to zip the files you want into one archive and submit them to leaderboard.
Scores¶
The scores for DeepPavlov’s pretrained models on the tasks are presented in the table.
Model |
Metric |
Score |
---|---|---|
Accuracy |
0.647 |
|
Accuracy |
0.588 |
|
Accuracy |
0.641 |
|
Matthew’s Corr |
0.251 |
|
F1/Acc |
0.336 / 0.486 |
|
Accuracy |
0.669 |
|
F1a/Em |
0.689 / 0.298 |
|
F1/EM |
0.77 / 0.768 |
|
Accuracy |
0.65 |