Visual Analytics and Imaging Laboratory (VAI Lab)
Computer Science Department, Stony Brook University, NY
Abstract: Organizing multivariate data spaces by their dimensions or attributes can be a rather difficult task. Most of the work in this area focuses on the statistical aspects such as correlation clustering, dimension reduction, and the like. These methods typically produce hierarchies in which the leaf nodes are labeled by the attribute names while the inner nodes are often represented by just a statistical measure and criterion, such as a threshold. This makes them difficult to understand for mainstream users. Taxonomies in science, biology, engineering, etc. on the other hand, are easy to comprehend since they provide meaningful labels at the inner nodes as well. Labeling inner nodes of taxonomies automatically requires the identification of hypernyms. Our proposed framework, called Taxonomizer, takes a visual analytics approach to meet this challenge. It appeals to the wisdom of humans to liaise with state of the art data analytics, neural word embeddings, and lexical databases. It consists of a set of visual tools that starts out with an automatically computed hierarchy where the leaf nodes are the original data attributes, and it then allows users to sculpt high-quality taxonomies for any multivariate dataset.
Teaser: This image shows the workflow on how a fully labeled taxonmy would be developed by a user for the Kings County housing dataset:
The panels are as follows: (a) Semantic space. (b) Taxonomy based on the semantic space. (c) Data space. (d) Taxonomy based on the data space. (e) Cophenetic plot whch allows users to control the influence of the similarities in the dataspace with those in the semantic space - here the user decided to use a balanced weighting of the two. (f-j) Evolution of the taxonomy.
Video: Watch it to get a quick overview:
Paper: S. Mahmood, K. Mueller, "Taxonomizer: Interactive Construction of Fully Labeled Hierarchical Groupings from Attributes of Multivariate Data," IEEE Trans. on Visualization and Computer Graphics, 6 (9): 2875-2890, 2020," pdf ppt talk-video