I earned my PhD of Computer Science with a main focus on Natural Language Processing (NLP) and Machine Learning (ML) from Stony Brook University in DEC 2020.
I am very grateful for having H Andrew Schwartz as my advisor.
My key research focus is in the field of Natural Language Processing (NLP) for social media analysis, language modeling, information extraction and data analysis. I collaborate with psychologists and computational linguists for Human-centered language modeling to obtain higher accuracies of various NLP tasks from traditional tasks (e.g., sentiment analysis) to novel tasks such as discourse style analysis for psychological assessment and well-being measurement. I especially focus on discourse relation parsing to extract key information for targeted tasks such as opinions or reasons for sentiment of reviews and a political stance, and finding the correlations of discourse styles with human variables such as personality.
Large-scale weakly-supervised multitask learning for parsing discourse relation embeddings from a new domain (social media) without labels. Bidirectional Hierarchical LSTMs with word-level attentions.
This study leveraged machine learning to evaluate the contribution of information from multiple developmental stages to prospective prediction of depression and anxiety in mid-adolescence. We used canonical correlation analysis (CCA). The feature set included several important risk factors spanning psychopathology, temperament/personality, family environment, life stress, interpersonal relationships, neurocognitive, hormonal, and neural functioning, and parental psychopathology and personality.
Ranked No 1. for predicting reddit users’ suicide risk level using their SuicideWatch and Non-SuicideWatch posts (Task B). Developed user-factor-adapted RNN models with post-level attention using BERT and psychology language model representations of reddit posts
LDA Topic modeling to capture momentary emotions from language (validated by the replication in the second year). Exploration over Linguistic Inquiry and Word Count (LIWC) categories and open-vocabulary models for the correlation analysis between language and momentary emotion.
The NLP pipeline of the joint model of the causality classifier (Linear SVM) and the causal explanation identifier (Bidirectional LSTM). The application of the pipeline to downstream tasks (Facebook Demographic Analysis and Yelp Review Sentiment Cause Detection)
Feature Adapatation of NLP models using human variables (age, gender, and personality) for downstream tasks (POS Tagging, PP-Attachment, Sentiment, Sarcasm, Stance)
The NLP pipeline of the joint model of the rule-based model (regular expression capable of capturing social-media-specific variations of discourse connectives with Tweet Brown Clusters) and the statistical model (Linear SVM)