Jie Gao

Associate Professor, Computer Science Department, Stony Brook University

Current Projects

Theory and Algorithms for Discrete Curvatures on Network Data

The new developments in technologies of embedded systems, sensors and wireless communications provide great potentials to improve the safety and security of the physical and social environment we live in, when we face unfortunate accidents and emergency events as well as malicious attacks as in crimes and terrorism. Our focus in this project is on developing mathematical tools and algorithms based on discrete curvatures for understanding and detecting community structures and anormalies in networks that can be of crucial value in many applications. We are interested in understanding high level patterns, community structures, and anomalies using geometric tools. The mathematical tools to be developed will be useful for many other networks like social or biological networks (e.g, protein-protein interaction networks).

This is a joint project with Professor Feng Luo from Rutgers University and is currently supported by NSF DMS-1737812.

Social Contagions and Influence

Information, beliefs, diseases, technologies, and behaviors propagate through social interactions as a contagion. Understanding of how these contagions spread is crucial in encouraging beneficial and healthy behaviors and discouraging the ones that are destructive and damaging. Rigorous, mathematical understanding of complex social contagions is not just an abstraction, but will guide applications from healthcare to word-of-mouth advertising. The technical content of this project is inherently interdisciplinary, and its lessons will apply to related fields such as probability, economics, sociology, and statistical physics. The research efforts are integrated with the educational and outreach activities of the PIs, who have strong records of broadly disseminating cutting-edge research to high school, undergraduate, and graduate students through teaching, outreach programs, and personal mentoring.

This project will transform our understanding of social contagions by: 1) Developing a suite of technical tools to enable improved understanding of specific complex processes; 2) Determining how various parameters of cascade and social structure together impact the chances of a cascade's success or failure; and 3) Obtaining empirical evidence to both corroborate the theoretical findings, and uncover the space of realistic setting for certain parameters.

Many existing models of contagion assume that increasing the number of infected (or affected) neighbors marginally decreases the chance of infection. Many contagions, such as adoption of expensive new technology, fail to have this property, but instead have more complex rules for infection. This leads to different spreading behaviors even on the same networks. Motivated by sociology research findings, this project will greatly enhance our understanding of social contagions in three aspects. First this project will provide rigorous study of the spreading behavior of a simplified theoretical model called k-complex contagions and its interactions with structures in the underlying graph such as tie strength, unusually influential nodes, and community structures. Second, this project presents a general model for studying cascades that is both theoretically tractable and practically motivated. The general model generalizes most previous theoretical models of complex and simple contagions and includes homophily and environmental factors on cascades. Finally, this project will use post-hoc analysis as well as real world social experiments to verify the veracity of the model and fit the parameters in different settings.

This is supported by NSF. Check out news coverage here.

Geometric and Topological Analysis for Trajectory Privacy.

The maturing of mobile devices and systems provide an unprecedented opportunity to collect a large amount of data about real world human motion at all scales. The rich knowledge contained in these data sets can have a huge impact in many fields ranging from transportation to health care, from civil engineering to energy management, from e-commerce to social networking. While the applications are paradigm-transforming, recent studies show that the trajectory data can raise serious privacy concerns in revealing personally sensitive information such as frequently visited locations or social ties. These concerns become the major hurdle in utilizing these data sets. This project systematically studies the issue of anonymizing trajectory data, from the bottom layer of trajectory sensing and data collection, to the middle layer of trajectory representation and anonymity, to the application layer of how the anonymized trajectory data can be used.

By the nature of trajectories as being time stamped sequence of points, in this project novel geometric and topological algorithms that directly work on distributed sensors collecting the trajectories are developed for achieving the objective. Queries to such decentralized sensors are made to ensure no sensitive information is released. The intellectual contribution lies in the following aspects. 1) The topological representation of trajectories, i.e., how trajectories pass around obstacles and landmarks in the domain is adopted. The topological representation is compact and descriptive, introducing novel discrete and combinatorial problems to study. 2) A novel framework is developed for distributed sensors to directly learn, classify and compare the topological types of the target trajectories, using harmonic one-forms and Hodge decomposition from algebraic topology. The new framework can substantially reduce the communication cost within the network, while maintaining the requirement of user privacy from the very beginning of sensing and data collection. 3) A family of anonymization algorithms using different ideas are developed, by altering the way to connect the time-stamped points into trajectories, by adjusting the topological resolution to reach a balance between data anonymity and utility, and by sensing and recording randomized hash data to answer popular trajectory queries. 4) The trajectory data sets are often huge, so algorithms for handling large scale trajectory data sets are developed in both centralized and decentralized settings.

This project is supported by NSF.

Geometric Analysis of Computer and Social Networks

This project is to develop theoretic foundations and practical algorithms for geometric analysis of massive, weighted graphs arising from computer networking applications. The main challenge of understanding large scale computer networking is the decentralized management and operations on the network. How the global behaviors (for example, congestion) emerge from decentralized, local operations (routing) is still a mystery. Differential geometry studies the connection of local structures (such as curvatures) and global properties (such as topology and geodesics). Geometric analysis theorems often naturally lead to distributed algorithms that achieve global objectives, which is ideal in networking applications.

This project generalizes classical geometric analysis methods to massive graphs, the theoretic exploration focuses on the curvatures on graphs and the heat kernel estimates, relation between optimal transportation on graphs and curvatures, Ricci flow on graphs, and homology/cohomology/homotopy groups on directed graphs. The theoretic results will be applied for studying fundamental problems in computer networks, including: 1) Geodesics and network congestion: which aims to understand the connection of network congestion (e.g., on the Internet) with network curvature, and try to apply Ricci flow to alleviate network congestion by modifying local curvature; 2) Graph embedding and efficient routing: which investigates how to find an embedding of the network in geometric space in order to support greedy routing; and 3) Resource allocation in wireless networks: which applies optimal transport theory to the problem of capacitated base station allocation. The research results will be useful for applications in a broad range of fields, from pure mathematics research to theoretic physics, from telecommunication in engineering to brain imaging in medicine.

This is currently supported by NSF and AFSOR.

Past Projects

Non-Isotropic Networked Sensor Deployment for Smart Buildings.

The objective of this proposal is to thoroughly investigate the deployment of non-isotropic sensors for smart building applications. Various sensing and control applications in smart building require accurate and efficient localization and tracking of human and activities. To achieve this goal, the non-isotropic networked sensor deployment is essential.

This proposal addresses a new set of problems arising from the deployment of four types of sensors: cameras, tripwire, motion sensors and microphones. The set of deployment problems would strive for full sensor coverage and wireless connectivity with a complex floor plan, and involve one or more of the following constraints: (i) Non-isotropic model using visibility (cameras and tripwires), (ii) Non-overlapping sensing range, (iii) Robust 2-coverage for accurate localization. The sensor deployment problem will heavily involve the geometric properties of the environment (building layout), as well as the geometric properties of the sensors (non-isotropic sensing ranges). Ideas from computational geometry will be applied to tackle these problems and approximation algorithms with worst-case guarantee for theoretical rigor as well as practical algorithms suitable for implementations will be developed.

A clear understanding on how to deploy sensors for smart building systems will allows for efficient planning and effective practice of networked sensor deployment to achieve the high quality sensing desired by various indoor applications. For developers, this work will significantly reduce the design and development costs for designing and building smart building systems. For end users, the resultant testbed will lead to exciting applications that will significantly improve the quality of life. It also enriches engineering curriculum to integrate the theory of computational geometry with the system building practice. The PIs are committed in improving female presence in computer science and exposure of research for high school students.

This project was carried out from 2012-2016, supported by NSF.

Algorithmic Aspects of Geometry for Using LIDAR and Wireless Sensor Networks for Combating Chemical Terror Attacks.

The project considers using a variety of sensors to monitor an unknown signal field over a spatial domain with unknown geometry, which may bear elevational variations. The main objective of this project is to develop mathematical algorithms that can integrate sensor readings from a large number of distributed devices and reconstruct both the signal field and the terrain. Specifically, the sensors involved in the project include the Light Detection And Ranging technology (LIDAR) and networked wireless sensors, for their complementary capabilities. LIDAR uses a constant number of powerful sensors, each takes an image from a distance. On the other hand, distributed wireless sensors use a large number of inexpensive sensors (possibly piggybacking on cellular phones), each takes a density reading at its position. The approach in this project is to use conformal and hyperbolic geometry, discrete curvature flows, topology and numerical analysis to develop novel algorithms to generate accurate density map. From the networking perspective, this project addresses the problems of LIDAR image fusion, network localization, sensor deployment, and integration of LIDAR and sensor data.

The project is motivated by the potentially devastating chemical terror attacks. Biological and chemical agents, used by adversaries, could potentially spread to a large region in a short time. This project focuses on development of algorithmic tools and techniques for gathering timely, accurate and useful information about the chemical or biological spread in case of such an attack. By exploiting the geometric properties of LIDAR data and sensor network data, the project provides critical observations on fundamental questions of how to organize such a large scale network; how to manage sensor data; and how to use the network for acting on the environment and the users.

This project was carried out from 2012-2015, supported by NSF.

Algorithmic Foundations for Joint Information Processing and Optimization in a Hybrid Mobile Sensor Network.

The objective of this project is to develop efficient distributed algorithms for joint information processing, optimization and coordination of static, pervasive sensor nodes and mobile, autonomous agents that altogether monitor and act on the environment. The application scenarios include tracking, searching and planning mobile agents with the underlying sensor network as a supporting communication infrastructure, and online resource management and allocation guided by real-time sensor detections.

This project focuses on three research problems: distributed min-cost bipartite matching, kinetic minimum Steiner tree, and facility location for mobile nodes, by using two non-trivial technical approaches, namely, embedding of the network metric into tree metrics, and distributed primal dual framework. There are two central intellectual questions investigated in this project. 1) How to make use of the sensor data to best serve user requests? This involves developing communication efficient schemes for sensors detecting interesting local events and the mobile users seeking such information to find each other. 2) How to best make use of the continuity and coherence in mobility, either in the presence of mobility enabled sensor data (e.g., detections of continuously moving targets), as well as the locations of mobile users? The project provides solutions that adapt to the system configuration with low update cost, avoiding drastic sudden changes or any level of reconstruction.

This project helps to extend the current Internet to the physical world, encouraging seamless integrations of sensing and control of the physical environment. The PI integrates the research agenda with new and existing curriculum development for both undergraduate and graduate education, and continues her efforts in improving female presence in computer science and exposure of research for high school students.

This project was carried out from 2011-2014, supported by NSF. See this page for publications from this project.

Analyzing Dynamics in Online Social Networks.

Social groups often exhibit a high degree of dynamism. Some groups thrive, while many others die over time. Modeling group dynamics and understanding whether/when a group will remain stable or shrink over time can be important in a number of social domains. Specifically, we look at the following problems in this area. In the first problem, we build models to predict if and when an individual is going to quit his/her group, and whether this quitting event will inflict substantial damage to the group. We take the World of Warcraft game as an exemplar platform for studying this problem. Our proposed solution starts from in-game census data and extracts features from multiple perspectives such as individual-level, guild-level, game activity, and social interaction features. These features are then used to build predictors. Our study shows that destructive group dynamics can often be predicted with modest to high accuracy, and feature diversity is critical to prediction performance. A related problem that we tackle looks at churn/attrition within an organization and tries to model this churn. Our approach provides us with valuable insights into understanding attrition within an organization. Lastly, we extend our analysis to be able to predict group stability. We build models to predict if a group is going to remain stable or is likely to shrink over a period of time. Results indicate that both the level of member diversity and social activities are critical in maintaining the stability of groups. We also find that certain ’prolific’ members play a more important role in maintaining group stability.

Many eCommerce websites and consumer review websites provide both reviews of products and a social network structure among the reviewers. Users can review products they purchased as well as the reviews others wrote. Users can also rate each other as trusted or untrusted relationships. By studying a data set from Epinions, we examine and quantify how the trust/distrust relationships among the users influence their ratings of the reviews. We discover that the opinions of one’s friends has an influence on his/her ratings, while the opinions of one’s foes seem to have no influence. We also find that a user’s friends play a significant role in guiding his/her future relationships.

This is part of the DARPA project titled GLAD-PC: Graph Learning for Anomaly Detection using Psychological Context, ADAMS program under contract W911NF-11-C-0216, 06/2011-06/2013.

Large Scale Sensor Network Routing using Conformal Geometry.

This project develops efficient routing schemes for large scale multi-hop wireless sensor networks with a complex shape. As sensor networks grow large in size, terrain features or non-uniform energy usage may leave the network with holes. Efficient routing becomes difficult as some knowledge of how to "get around" the holes is needed.

The novelty of this project is to use conformal geometry to compute a proper embedding of the network such that simple, distributed greedy routing schemes achieve desirable properties. Conformal geometry shows that any surface can be deformed to three canonical shapes: the sphere, the plane and the disk. Thus, one can "regulate" any sensor field shape to be of a canonical, simple form. The complexity of the routing problem and the domain specifics are encapsulated in the embedding such that routing decision becomes trivial. The conformal mapping of a sensor network is computed using Ricci curvature flow, an intrinsically distributed computation routine that can easily incorporate the dynamic changes of wireless links. This project focuses on conformal mapping and the companion greedy routing solutions that guarantee delivery, achieve good load balancing, facilitate in-network storage and data-centric routing, are resilient to network failures, and generalize to both 2D and 3D networks.

This project explores the unique cross-disciplinary area of wireless networking and differential geometry. Although the main thrust of this proposal is theoretical, it is expected that the project can also provide practical routing solutions for large scale sensor networks.

This project was carried out from 2010-2013, supported by NSF. See this page for publications from this project.

CAREER: Geometric Algorithms for Wireless Sensor Networks.

Embedded networked sensing devices are becoming ubiquitous across many activities that are important to our economy and life. The physical locations of the sensor nodes greatly impact the system design in all aspects from low-level networking and organization to high-level information processing and applications.

This project takes a geometric approach to study algorithms in sensor networks and investigates a number of fundamental ideas about 1) how the geometric embedding of sensor networks influences how the network can operate? 2) how to exploit the geometric characteristics for efficient and scalable network design?

The project addresses a number of important architectural components where geometry has a fundamental influence, including localization, topology discovery, naming and routing, information discovery and brokerage, and mobility, in a harsh environment in which sensor networks operate:inaccuracy or unavailability in location information, limited resources such as bandwidth and energy, noisy or incomplete data, preferable distributed computation and high link or node failure rate. Expected results include the exploration of new models and abstractions, and novel geometric algorithms with provable performance guarantee in a probabilistic, dynamics and information incomplete world.

We expect that the exploration of sensor networks' rich geometric properties will reveal key insights that enable the creation of large-scale sensor networks. In addition, this interdisciplinary project provides a bridge between communities of computational geometry and wireless networking - identifying important geometric problems for the computational geometry community as well as supplying efficient algorithmic solutions for the networking community.

This project was carried out from 2007-2010, supported by NSF.