Improving Document Understanding through Saliency Detection (ISTI Grants for Young Mobility seminar series)

Day - Time: 07 December 2016, h.11:00
Place: Area della Ricerca CNR di Pisa - Room: C-29

Andrea Esuli


A lot of research efforts have been spent in the latest years to devise effective solutions to allow machines understand the main topics covered in a document. Several approaches have been proposed in literature to address this problem, the most popular is Entity Linking (EL), a task consisting in automatically identifying and linking the entities mentioned in a text to their URIs in a given knowledge base, e.g., Wikipedia. Despite its simplicity, the EL task is very challenging due to the ambiguity of natural languages. However, not all the entities mentioned in a document have the same relevance and utility in understanding the topics being discussed. Thus, the related problem of identifying the most relevant entities present in a document, also known as Salient Entities, is attracting increasing interest and has a large impact on several text analysis and information retrieval tasks. This seminar will focus on a novel supervised technique for comprehensively addressing both entity linking and saliency detection. We found that blending together the two tasks makes it possible both to improve the accuracy of disambiguation and to exploit complex and computationally expensive features in order to detect salient entities with high accuracy. In addition, we will outline the strategy adopted to build a novel dataset of news manually annotated with entities and their saliency.