Data Science Lab

Back to Research Areas/Labs
Location CEWIT 243
Mission Big Data makes for big and interesting problems! Our lab focuses on analyzing large-scale text streams such as news, blogs, and social media to identify cultural trends around the world's people, places, and things.
Hardware

We have a dozen or so modern PC-based workstations used for software development, plus a variety of machines for larger-scale computation. Our largest-memory machines (Newton, Galileo, and Einstein) are Dell Poweredge T710s, with 16TB of hard drive storage, 144GB of RAM running at 800MHz, with 2 6-core Intel Xeon X5660 hyperthreaded 2.8GHz processors for a total of 12 cores each.

These are supported by a 45-disk, 135 terabyte high-speed BackBlaze file server, with a terabyte backup machine, and several laboratory web servers.

For larger computing applications, we maintain a 28-node rack-mounted cluster computer consisting of 24 dual-processor PowerEdge 1750 Xeon processors plus 4 larger database nodes.

Operating System Linux
Software Hadoop, Java, and Python
Details Our research covers a range of topics in natural language processing. A current focus is using Deep Learning techniques to build concise representations of the meanings of words in all significant languages, and use these powerful features to recognize entities and measure sentiment and other properties of texts. Another focus involves analyzing Wikipedia to identify the fame and significance of historical figures as reported in our book Who's Bigger? and associated website. Our Lydia technology has been licensed by General Sentiment, a social media analysis startup.
Funding NSF and industrial funding
Coordinator Steve Skiena
Lab Web page Data Science Lab