Bringing Collections to Life with Large Language Models

Day - Time: 26 April 2023, h.14:30
Place: Area della Ricerca CNR di Pisa - Room: Faedo (C-29)
  • Luca Soldaini (Allen Institute for AI, Seattle, USA)

Franco Maria Nardini


When discussing Large Language Models (LLMs), corpora are often only mentioned for their potential to serve as training data. However, LLMs can play a crucial role in enriching collections to make them more suitable to their users. In this talk, I will examine two scenarios in which LLMs have been employed to augmenting corpora. First, I will discuss how generative models can be used to improve answers to general domain user questions. Then, I will give an overview of recent initiatives at Semantic Scholar that leverage LLMs to enrich scholarly documents. Overall, these two research directions show that LLMs can significantly help exploring, consuming, or augmenting existing corpora.

Bio: Luca Soldaini is an Applied Research Scientist at the Allen Institute for AI in the Semantic Scholar team. Their current research focuses on question answering, document understanding, and information retrieval in scientific literature. Prior to joining AI2 in 2022, Luca was an Applied Scientist at Amazon Alexa, where they worked on Open Domain Question Answering. Luca obtained their PhD from Georgetown University in 2018; during their doctoral studies, Luca investigated approaches to help health professionals and lay people find trustworthy and relevant medical information online. Beyond their academic work, Luca is also a Core Organizer at Queer In AI, a non-profit organization that seeks to promote awareness of queer issues in artificial intelligence and create a supportive and inclusive community for queer researchers.