Entity Linking

Entity linking is the task of mapping words from text (e.g. names of persons, locations and organizations) to entities from the target knowledge base (Wikidata in our case).

Entity Linking systems are available for English and Russian languages.

Entity Linking component performs the following steps:

  • the substring, detected with NER (English) or NER (Russian), is fed to TfidfVectorizer and the resulting sparse vector is converted to dense one

  • Faiss library is used to find k nearest neighbours for tf-idf vector in the matrix where rows correspond to tf-idf vectors of words in entity titles

  • entities are ranked by number of relations in Wikidata (number of outgoing edges of nodes in the knowledge graph)

  • BERT (English) or BERT (Russian) is used for entities ranking by entity description and by sentence that mentions the entity

Use the model

Pre-trained model can be used for inference from both Command Line Interface (CLI) and Python. Before using the model make sure that all required packages are installed using the command:

For English version:

python -m deeppavlov install entity_linking_eng

To use a pre-trained model from CLI use the following command:

python -m deeppavlov interact entity_linking_eng -d
>>> The city stands on the River Thames in the south-east of England, at the head of its 50-mile (80 km) estuary leading to the North Sea.
>>> (['the river thames', 'the north sea', 'england'], [[4, 5, 6], [30, 31, 32], [13]], ['Q19686', 'Q1693', 'Q21'])

For Russian version:

python -m deeppavlov install entity_linking_rus

To use a pre-trained model from CLI use the following command:

python -m deeppavlov interact entity_linking_rus -d
>>> Москва — столица России, город федерального значения, административный центр Центрального федерального округа и центр Московской области.
>>> (['москва', 'россии', 'центрального федерального округа', 'московской области'], [[0], [3], [11, 12, 13], [16, 17]], ['Q649', 'Q159', 'Q190778', 'Q1749'])

Entity Linking model can be used from Python using the following code:

from deeppavlov import configs, build_model

el_model = build_model(configs.kbqa.entity_linking_rus, download=True)
el_model(['Москва — столица России, город федерального значения, административный центр Центрального федерального округа и центр Московской области.'])