Giovani in un'ora - Ciclo di seminari - Quinta parte

Day - Time: 02 November 2023, h.11:00
Place: Area della Ricerca CNR di Pisa - Room: C-29

Fabio Carrara


Alberto Veneri - "Explain the prediction of large language models used for ranking"

Abstract: Understanding the behavior of deep neural networks for Information Retrieval is crucial to improving trust in these effective models.

Current popular approaches to diagnose the predictions made by deep neural networks for ranking are mainly based on: i) the adherence of the retrieval model to some axiomatic property of the IR system, ii) the generation of free-text explanations,  iii) feature importance attributions, or iv) model analysis.

During my period abroad at the Delft University of Technology, founded by the Grant for Young Mobility (GYM), I focused on explanation methods based on the model analysis and characterization of the embedding space learned by these new large language models, and in this talk I will share the main findings.

Giulio Ermanno Pibiri - "Indexing and compressing pangenomes with meta-colored compacted de Bruijn graphs"

Abstract: The colored compacted de Bruijn graph (c-dBG) has become a fundamental tool used across several areas of genomics and pangenomics. For example, it has been widely adopted by methods that perform read mapping or alignment, abundance estimation, and subsequent downstream analyses. These applications essentially regard the c-dBG as a map from k-mers to the set of references in which they appear. The c-dBG data structure should retrieve this set — the color of the k-mer — efficiently for any given k-mer, while using little memory. To aid retrieval, the colors are stored explicitly in the data structure and take considerable space for large reference collections, even when compressed. Reducing the space of the colors is therefore of utmost importance for large-scale sequence indexing.

We describe the meta-colored compacted de Bruijn graph (Mac-dBG) — a new colored de Bruijn graph data structure where colors are represented holistically, i.e., taking into account their redundancy across the whole collection being indexed, rather than individually as atomic integer lists. This allows the factorization and compression of common sub-patterns across colors. While optimizing the space of our data structure is NP-hard, we propose a simple heuristic algorithm that yields practically good solutions. Results show that the Mac-dBG data structure improves substantially over the best previous space/time trade-off, by providing remarkably better compression effectiveness for the same (or better) query efficiency. This improved space/time trade-off is robust across different datasets and query workloads.

Luigi Malomo - "Computational design of rigid molds for casting 3D objects"

Abstract: After the recent developments in the automatic design of soft molds for casting applications, we will venture into the much more complex task of using rigid molds to physically reproduce objects. While soft molds are widely diffused among hobbyist (e.g., artists and artisans), rigid molds represent the backbone of industrial production and are the default manufacturing process to mass-produce objects in series. Automating the generation of rigid molds starting from an input shape is challenging. Conventionally, the mold design is performed manually by skilled engineers using CAD/CAM tools, and sometimes it also requires editing the original input 3D model. Instead, we will present a first research attempt to automatize the mold design for generic freeform shapes. The proposed method can decompose any input shape into a set of pieces, each of which can be manufactured using a bipartite rigid mold.