Home Research Publications Teaching Lab CV

My research covers Spatial Big Data Management, IoT Data Management, GIS, Medical Imaging Informatics, AI in Healthcare, Population Health and blockchain.

Supported applications include geospatial big data, Internet of Things, digital pathology (2D and 3D), radiology imaging, opioid epidemic research, and healthcare data integration.

The work is sponsored by NSF, NIH and NCI, CDC, Pitney Bowes, Amazon and Google.
Scalable Big Data Management Systems
My research goal on big data management and analytics is to address the research challenges for delivering effective, scalable and high performance software systems for managing, querying and mining complex big data at multiple dimensions, including 2D and 3D spatial and imaging data, temporal data, spatial-temporal data, and sequencing data.


Computational Digital Pathology (2D and 3D)
Systematic analysis of large-scale whole slide image data can involve many interrelated analyses on large amount of images, generating tremendous amount of quantifications such as shape and texture, classifications of the quantified features, and spatial queries to discover spatial relationships and patterns. Our research focuses on spatial and image analytics at extreme scale.


Big Data Driven GIS and Population Health
GIS oriented public health research has a strong focus on the locations of patients and the agents of disease, and studies the community and region level patterns and variations, and the impact of demographical, socio-economical, and environmental factors on diseases and human health. We take a big data driven approach by integrating Electronic Health Records (EHR), social media, among others, to discover geospatial and temporal patterns at high resolution. In particular, we have been working on GIS and AI driven opioid epidemic research. We develop a geospatial and AI driven approach to identify communities of high risks for interventions and predict opioid overdose risk of patients using EHR for better clinical decision support.



Blockchain for Eletronic Health Records
Electronic health records (EHRs) are critical information for diagnosis and treatment in healthcare, which need to be frequently distributed and shared among peers such as healthcare providers, insurance companies, pharmacies, researchers, and patient families, among others. However, EHR data are highly private and sensitive, which poses major challenges for current healthcare data sharing infrastructures. Current sharing of EHR data is often a tedious manual process, which can lead to significant turnaround time.
A patient centric, blockchain based EHR sharing system with permissioned blockchain framework with immutable and transparent ledger will help manage authentication, confidentiality, and accountability for EHR data sharing, achieving a secure, and trustable EHR management infrastructure.



Clinical Natural Language Processing
While electronic medical record (EMR) systems employ increasingly rich data models that offer a wide variety of options for structured data entry, a large amount of medical data is in free-form, narrative text reports. Our research goal in clinical natural language processing is to provide convenient and intelligent information extraction and classification from medical reports by taking advantage of both individual human interventions and collective human intelligence, to ultimately improve diagnosis, reduce errors, and inform medical practice and decision making. One ongoing project is IDEAL-X, an interactive, incrementally learning based information extraction system to facilitate the process of information extraction and classification from narrative medical reports and transform extracted data into normalized structured forms. The system takes an incremental learning based approach which quickly learns from users' feedbacks from a small set of reports, and a chieves high accuracy on data extraction with minimal effort from users. Extracted data can be further normalized through controlled vocabularies. IDEAL-X requires no special configuration or training sets, and is not constrained to specific domains, thus it is easy to use and highly portable. IDEAL-X is being used for cohort identification from tens of thousands of patients, and for automated classification for massive number of radiology reports from CDC.