Big Data Clustering using Particle Swarm Optimization Algorithm

Day - Time: 17 May 2018, h.10:00
Place: Area della Ricerca CNR di Pisa - Room: C-29
  • Iman Behravan (Department of Electrical Engineering, University of Birjand, Iran)

Roberto Trasarti


Big data referred to huge datasets with high number of objects and high number of dimensions. Mining and extracting big datasets is beyond the capability of conventional data mining tools. Clustering, which is the process of dividing the data points of a dataset into different groups (clusters), is an important data mining and big data mining technique. Kmeans yet is an efficient clustering algorithm but it suffers from some drawbacks. Its output result depends on its initial value of cluster centers and it is unable in finding the number of clusters. In this research a new clustering method for big datasets is introduced based on Particle Swarm optimization (PSO) algorithm. PSO is a heuristic algorithm with high ability in searching the solution space and finding the global optimum point. The proposed method is a two-stage algorithm which first searches the solution space for proper number of clusters and then searches for finding the position of the centroids.