Young Research(ers) Workshop 1/2

Day - Time: 21 May 2014, h.10:30
Place: Area della Ricerca CNR di Pisa - Room: A-27

Andrea Esuli


Nello "Young Research(ers) Workshop 1/2" tre dei sei vincitori dell'ISTI Young Researcher Award 2014 ( ) presenteranno una selezione dei risultati della loro attività di ricerca.

Tutto il personale è invitato, con particolare riguardo ai giovani dell'istituto.

Lo Young Research(ers) Workshop 2/2 con gli altri tre vincitori si terrà il 2 Luglio.


Speaker: Gianpaolo Coro

Title: Providing Statistical Analysis Algorithms as-a-Service by means of a distributed e-Infrastructure


In computational statistics, interest is growing towards modular and pluggable solutions that enable the repetition and the validation of experiments and allow the exploitation of statistical algorithms in several contexts. Furthermore, such procedures are requested to be remotely hosted and to "hide" the complexity of the calculations, especially in the case of cloud computations. For such reasons, the usual solution of supplying modular software libraries containing implementations of algorithms is leaving the place to Web Services accessible through standard protocols and hosting such implementations. E-Infrastructures allow to provide algorithms as-a-Service, to hide calculation complexity and to enable experimental results and parameters sharing. This talk will present the so called "gCube Statistical Manager" system, a distributed network of web services that supports distributed processing as well as Cloud processing. This system is meant to be used and enriched with new algorithms by several communities of practice of an e-Infrastructure. It hides the complexity to manage algorithms deployment and adaptation, and facilitates results access and sharing. For example, an algorithm developed by a biologist can be imported "as-is" on the platform, which will automatically enable multi-user access, distributed and Cloud processing, a user interface and results sharing. The talk will support the system effectiveness by means of practical examples from Computational Biology.

Speaker: Anna Monreale

Title: Privacy-by-design in Data Analytics and Social Mining


Privacy is ever-growing concern in our society: the lack of reliable privacy safeguards in many current services and devices is the basis of a diffusion that is often more limited than expected. Moreover, people feel reluctant to provide true personal data, unless it is absolutely necessary. Thus, privacy is becoming a fundamental aspect to take into account when one wants to use, publish and analyze data involving sensitive information. Unfortunately, it is increasingly hard to transform the data in a way that it protects sensitive information: we live in the era of big data characterized by unprecedented opportunities to sense, store and analyze social data describing human activities in great detail and resolution. As a result privacy preservation simply cannot be accomplished by de-identification. In this talk we propose the privacy-by-design paradigm to develop technological frameworks for countering the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of social mining and data analytical technologies. Our main idea is to inscribe privacy protection into the knowledge discovery technology by design, so that the analysis incorporates the relevant privacy requirements from the start.

Speaker: Giuseppe Ottaviano

Title: Inverted indexes compression with Elias-Fano


Inverted indexes are the core of every modern information retrieval system; to cope with steadily growing document corpora, such as the Web, it is crucial to represent them as efficiently as possible with respect to space and decompression speed. Recently a data structure traditionally used in the field of succinct data structures, namely the Elias-Fano representation of monotone sequences, has been applied to the compression of indexes, showing surprisingly excellent performance and good compression. This talk will start by briefly reviewing the Elias-Fano representation and its application to inverted indexes. We will then move on to a new representation based on a two-level Elias-Fano data structure, which significantly improves compression efficiency by (almost-)optimally partitioning the index, with only a negligible slow-down in decompression.