Large Scale Data Analytics: Challenges, and the role of Stratified Data Placement

Day - Time: 16 June 2016, h.15:00
Place: Area della Ricerca CNR di Pisa - Room: C-29
  • Srinivasan Parthasarathy (The Ohio State University)

Raffaele Perego


With the increasing popularity of XML data stores, social networks and Web 2.0 and 3.0 applications, complex data formats, such as trees and graphs, are becoming ubiquitous. Managing and processing such large and complex data stores, on modern computational eco-systems, to realize actionable information efficiently, is daunting. In this talk I will begin with discussing some of these challenges. Subsequently I will discuss a critical element at the heart of this challenge relates to the placement, storage and access of such tera- and peta- scale data. In this work we develop a novel distributed framework to ease the burden on the programmer and propose an agile and intelligent placement service layer as a flexible yet unified means to address this challenge. Central to our framework is the notion of stratification which seeks to initially group structurally (or semantically) similar entities into strata. Subsequently strata are partitioned within this eco- system according to the needs of the application to maximize locality, balance load, minimize data skew or even take into account energy consumption. Results on several real-world applications validate the efficacy and efficiency of our approach.

Bio: Srinivasan Parthasarathy is a Professor of Computer Science at Ohio State. He directs the data mining research lab and co-directs the undergraduate major in data analytics -- a first-of-its-kind effort in the US. His work has received eight best paper awards or similar honors from leading conferences in Data Mining, Database Systems and Network Science. He has received the Ameritech Faculty Fellowship; an NSF CAREER award; a DOE ECPI award; and numerous research awards from industry (e.g. Google, Microsoft, IBM) for his work in these areas. An active area of interest currently is in the role of network analysis and data mining in modern emergency response systems that couple both social (citizen) sensing and physical sensing modalities. He also serves as the steering committee chair for the SIAM Data Mining conference series and sits on the editorial board of several leading journals in the field.