Data Science Lab
|Back to Research Areas/Labs|
|Mission||Big Data makes for big and interesting problems! Our lab focuses on analyzing large-scale text streams such as news, blogs, and social media to identify cultural trends around the world's people, places, and things.|
We have a dozen or so modern PC-based workstations used for software development, plus a variety of machines for larger-scale computation. Our largest-memory machines (Newton, Galileo, and Einstein) are Dell Poweredge T710s, with 16TB of hard drive storage, 144GB of RAM running at 800MHz, with 2 6-core Intel Xeon X5660 hyperthreaded 2.8GHz processors for a total of 12 cores each.
These are supported by a 45-disk, 135 terabyte high-speed BackBlaze file server, with a terabyte backup machine, and several laboratory web servers.
For larger computing applications, we maintain a 28-node rack-mounted cluster computer consisting of 24 dual-processor PowerEdge 1750 Xeon processors plus 4 larger database nodes.
|Software||Hadoop, Java, and Python|
|Details||Our research covers a range of topics in natural language processing. A current focus is using Deep Learning techniques to build concise representations of the meanings of words in all significant languages, and use these powerful features to recognize entities and measure sentiment and other properties of texts. Another focus involves analyzing Wikipedia to identify the fame and significance of historical figures as reported in our book Who's Bigger? and associated website. Our Lydia technology has been licensed by General Sentiment, a social media analysis startup.|
|Funding||NSF and industrial funding|
|Lab Web page||Data Science Lab|