Discovering and Disambiguating Named Entities in Text

Day - Time: 06 October 2014, h.16:00
Place: Area della Ricerca CNR di Pisa - Room: C-29
  • Johannes Hoffart (Max-Planck-Institut für Informatik)

Diego Ceccarelli


Disambiguating named entities in natural language texts maps ambiguous names to canonical entities registered in a knowledge base such as DBpedia, Freebase, or YAGO. Knowing the specific entity is an important asset for several other tasks, e.g. entity-based information retrieval or higher-level information extraction.

In this talk I will cover three aspects of entity disambiguation:

1. Entity disambiguation to Wikipedia-derived knowledge bases. The approach to this problem uses several ingredients: the prior probability of an entity being mentioned, the similarity between the context of the mention in the text and an entity, as well as the coherence among the entities. Using with a fast graph algorithm, the disambiguation is solved using joint inference.

2. Semantic relatedness for entity disambiguation, or how to go beyond Wikipedia.
Extending the disambiguation method, we present a novel and highly efficient measure to compute the semantic coherence between entities based on keyphrases. This measure is especially powerful for long-tail entities in Wikipedia or such knowledge bases that do not interlink their entities like Wikipedia does.

3. Discovering emerging entities, or how to go to the real world. Wikipedia and knowledge bases can never be complete due to the dynamics of the ever-changing world: new companies are formed every day, new songs are composed every minute. To keep up with the real worldâ??s entities, we introduce a method to explicitly model the out-of-knowledge-base entities, enabling a more robust discovery of previously unseen entities.

Speaker bio:
Johannes Hoffart is a PhD student at the Databases and Information Systems group at the Max Planck Institute for Informatics. His current research focus is the linking of unstructured text to structured knowledge bases by disambiguating named entities, as well as the use of entities and knowledge bases in information retrieval tasks.