Visual Analytics and Imaging Laboratory (VAI Lab)
Computer Science Department, Stony Brook University, NY
is an important preparation step in big data processing. It may even be used
to detect redundant data points as well as outliers. Elimination of redundant
data and duplicates can serve as a viable means for data reduction and it
can also aid in sampling. Visual feedback is very valuable here to give users
confidence in this process. Furthermore, big data preprocessing is seldom
interactive, which stands at conflict with users who seek answers immediately.
The best one can do is incremental preprocessing in which partial and hopefully
quite accurate results become available relatively quickly and are then refined
over time. We propose a correlation clustering framework which uses MDS for
layout and GPU-acceleration to accomplish these goals. Our domain application
is the correlation clustering of atmospheric mass spectrum data with 8 million
data points of 450 dimensions each.
Teaser: Below can be seen that the relevant clusters already emerge relatively early in the iterative reduancy clustering process. The rotations are only due to the repeated MDS layout process. Eliminating these rotations is subject of future work.
Paper: B. Wang, P. Ruchikachorn, K. Mueller, “GPU-Accelerated Incremental Correlation Clustering of Large Data with Visual Feedback,” The First IEEE Workshop on Big Data Visualization, Santa Clara, CA, October, 2013..pdf ppt