Giovani in un'ora - Ciclo di seminari - Seconda parte

Day - Time: 28 September 2023, h.11:00
Place: Area della Ricerca CNR di Pisa - Room: C-29

Fabio Carrara


Rossana Buongiorno - "Tailoring skip connections for task-specific optimization of the attention mechanism in UNet-based Convolutional Neural Networks for medical image segmentation"

Abstract: In the realm of medical imaging, precision in disease segmentation is paramount for successful diagnosis and treatment monitoring, which rely heavily on the accurate quantification of the affected tissue versus healthy one. Nevertheless, a typical challenge arises when the visual characteristics of the disease, particularly in terms of gray-scale values, closely resemble those of the adjacent healthy tissues.

To handle this issue, we propose a novel pipeline for the optimization of the attention mechanism in UNet-based architectures in order to further exploit the contextual information stored in the images. Notably, our approach is aimed at tailoring skip connections so that they prioritize areas that exhibit infection-related features, guiding attention exclusively to the areas of interest within the affected organs. These findings may have significant implications in the development of a precision medicine and, specifically, to obtain accurate disease localization and monitoring.

We assessed our method using a dataset of 90 volumetric scans from COVID-19 patients, demonstrating that our approach promises to improve the accuracy of COVID-19 diagnosis and disease monitoring in clinical settings compared to the original Attention U-Net.

Nicola Messina - "Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural Language"

Abstract: Due to recent advances in pose-estimation methods, human motion can be extracted from a video in the form of 3D skeleton sequences. Despite the significant and increasing interest in this modality, effective and efficient content-based access to large volumes of spatio-temporal skeleton data still remains a challenging problem. In this talk, we propose a novel content-based text-to-motion retrieval task, which aims at retrieving relevant motions based on a specified natural-language textual description. To this aim, we construct an informative common text-motion latent space where an efficient k-NN search can be performed. To define baselines for this uncharted task, we employ the BERT and CLIP language representations to encode the text modality and successful spatiotemporal models to encode the motion modality. Besides employing state-of-the-art encoders proposed for motion classification, we introduce our transformer-based approach, called Motion Transformer (MoT), which employs divided space-time attention to effectively aggregate the different skeleton joints in space and time. Inspired by the recent progress in text-to-image/video matching, we experiment with two widely adopted metric-learning loss functions, and we set up a common evaluation protocol by employing ad-hoc metrics for assessing the quality of the retrieved motions. We experimented with our framework on the two recently introduced KIT Motion-Language and HumanML3D datasets, showing the effectiveness of the proposed baselines on this novel, challenging retrieval task.