Home Research Publications Teaching Funding Software Group News
Information and Data Extraction using Adaptive Online Learning (IDEAL-X)
IDEAL-X is a generic information extraction framework that can support diverse clinical reports, enable prompt interaction between humans and machines, and produce highly accurate results with minimal human effort. IDEAL-X has been adopted by Centers for Disease Control and Prevention (CDC) for automated classification of Venous Thromboembolism from radiology reports, and cohort identification for cardiology research at Emory University.
Hadoop-GIS: A High Performance Spatial Data Warehousing System Over MapReduce
Hadoop-GIS is a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through space partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects on MapReduce. Hadoop-GIS takes advantage of global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture.
Pathology Analytical Imaging Data Management System
Systematic analysis of large-scale whole slide image data can involve many interrelated analyses on large amount of images, generating tremendous amount of quantifications such as shape and texture, as well as classifications of the quantified features. Such pathology analytical results are spatially oriented and most queries are GIS flavor queries. Pathology Analytical Imaging Standards (PAIS) (http://www.openpais.org/) is a large scale imaging GIS system that provides data models and data management architecture to manage image data products, feature sets and results from image analysis and machine learning. PAIS employs a shared nothing parallel spatial database architecture to support efficient queries for large scale datasets. The data management infrastructure provides highly efficient data management and spatial analytics for analytical pathology imaging to support biomedical research.
Pathology Image Database System (PIDB)
High-resolution pathology images provide rich information about the morphological and functional characteristics of biological systems, and are transforming the field of pathology into a new era. There are unique requirements on modeling, managing and querying whole slide images, including compatibility with standards, scalability, support of image queries at multiple granularities, and support of integrated queries between images and derived results from the images. Pathology Image Database System (PIDB) is a standard oriented image database to support retrieval of images, tiles, regions and analytical results, image visualization and experiment management through a unified interface and architecture.
Enabling Ontology Based Semantic Queries for Biomedical Databases (OntoDBLink)
DBOntoLink provides a bridge between ontology repositories and databases to support semantic operations directly inside a database based on standard database query languages. Semantically annotated biomedical databases thus can be easily extended with powerful and expressive semantic enabled queries with DBOntoLink to use major ontologies hosted at NCBO BioPortal, with high query efficiency achieved through caching management.
YSmart: Correlation Aware SQL-to-MapReduce Translator
YSmart is a correlation aware SQL-to-MapReduce translator, which is built on top of the Hadoop platform. For a given SQL query and related table schemas, YSmart can automatically translate the query into a series of Hadoop MapReduce programs written in Java. Compared to other SQL-to-MapReduce translators, YSmart has been proved to have major advantages of high performance, high extensibility, and high flexibility.
A Platform for Collaborative Scientific Research (SciPort)
SciPort is a new generation of biomedical data management and integration platform. With a universal data model and distributed metadata-based architecture, SciPort enables secure, powerful and lightweight scientific data management, integration and sharing with smooth user experience. SciPort provides a unified solution for scientific data collection, modeling, storage, indexing, searching, browsing, reporting, visualization, and sharing. SciPort is a Siemens product.