Jun Seok Kang
I am a PhD candidate at the Department of Computer Science at Stony Brook University.

The main focus of my research is on social media analytics. To analyze subtle and nuanced sentiment of words that are prevalent in social media, I work on constructing a connotation lexicon by formulating a graph model of words using large scale data. I have also worked on deception detection by analyzing the characteristics of keystroke patterns. The key insight is that the style of writing, together with the keystroke patterns, can help detecting the intent of authors, even when it is deceptive. I have also worked on predicting restaurant hygiene inspection results from online reviews in a hope to be helpful to local government in dispatching their inspectors. This project is an example of turning NLP techniques for social good. More broadly, my work connects to writing style analytics, and I am currently investigating semantic composition patterns of sentences/documents, in order to build automatic revision algorithms.

My Adviser is Yejin Choi

- Email:
- CV
Learning Connotation of Words

A considerable amount of studies of sentiment analysis have been focused on learning explicit sentiment of words and documents. To capture subtle shades of sentiment, this project aims to learn subtle, nuanced connotation of words, even that of seemingly objective words such as "intelligence", "human", and "cheesecake".

We construct a graph of words encoding diverse linguistic insights (such as semantic prosody, distributional similarity and semantic parallelism of coordination) to construct a connotation lexicon.
[Paper@ACL 2013] [Project Page]

We enhance our graph of words by adding senses to increase its coverage and to encode lexcical relations as additional information. From this extended graph, we construct a connotation lexicon, ConnotationWordNet, using loopy belief propagation as a lexicon induction algorithm.
[Paper@ACL 2014] [Project Page]

Predicting Hygiene Status of Restaurants using Online Reviews

Many counties & cities such as NYC or LA require restaurants to post their inspection grades which helps people to decide where to eat.

However, the health departments often have limited resources to dispatch inspectors. (Among the Seattle restaurants listed on Yelp.com (2006~2013), more than 50% of them didn't have an inspection record!)

In this project, we predict hygiene status of restaurants using online reviews in a hope to help local governments to dispatch their inspectors more efficiently.
[Paper@EMNLP 2013] [Project Page]

* This project is thanks to collaboration with Mike Luca at Harvard Business School.

Detecting Deception Using Keystroke Patterns

Many of the studies on deception detection focus on the insights you can learn from existing texts that are already written.

In this project, however, we turn our attention to keystroke patterns of truthful and deceptive writers and explore ways to use them as a means of detecting deceptive intent of the writers.
[Paper@EMNLP 2014] [Project Page]

Learning Semantic Composition Patterns of Sentences; Revising Sentences

In this on-going project, we first learn semantic composition patterns of sentences from the documents of the same domain using mixed-Integer programming, and revise the sentences automatically using the learned sematic patterns and their statistics.
[In Progress]