We explore the use of keyboard strokes as a means to access the real-time writing process of online authors. Our approach draws motivation from prosody-based speech analysis and offers insight into tangible connections between typing patterns and authors' intent. In particular, we study cues of deceptive intent. Typing pattern variations such as different pause durations and diverse revisional patterns (i.e., differences in how authors edit their writing) can help distinguish between truthful and deceptive writings. Empirical results show that incorporating features based on keystroke patterns lead to improved identification of deceptive writings across two different domains: business reviews and essays.
As opposed to text, which is generated language, typing (or, for that matter, writing) is, in and of itself, a generation process — thus facilitating comparisons to cognitive processes typically associated with speech.
Simple analogies that we have exploited are typing speed, pause durations and editing maneuvers, which are analogous to speech rate, disfluency, and discourse coherence.
Recent research has shown that lying generally imposes a cognitive burden, which only increases in real-time scenarios. Cognitive burden, in turn, has been known to produce differences in speech, and similarly, in keystroke features. Pauses in typing, especially, have been linked to increased cognitive burden.
Keystroke logs can be utilized as an extended linguistic signal capturing the real-time writing process. Motivated by analogies to prosody, we extract features from keystroke logs to
An analysis of how much time writers spend on different types of words reveals more about the relation between deception and cognitive processes. The top 20 words with the greatest time-span difference between truthful and deceptive reviews are shown here:
More time spent in | |
---|---|
truthful reviews | deceptive reviews |
![]() |
![]() |
Classification Accuracy: adding keystroke features leads to improvement over a strong baseline. | |
---|---|
![]() |
The dataset contains truthful and deceptive writings from two domains: business reviews and essays on two topics of social interest: (a) gun control and (b) gay marriage. The data is available for download as compressed tar.bz2 files:
The uncompressed dataset consists of files with tab-separated values. The key log data is found in the last
column, titled ReviewMeta. This field has a list of KeyUp,
KeyDown and MouseUp
event logs. Note that the first event timestamp is not always zero. The event logs have the following formats:
[timestamp] KeyUp/KeyDown [javascript keycode]
[timestamp] MouseUp [begin-index]
[end-index]
Keystroke Patterns as Prosody in Digital
Writings: A Case Study with Deceptive Reviews and Essays
Ritwik Banerjee, Song Feng, Jun S. Kang, Yejin Choi
Empirical Methods on Natural Language Processing (EMNLP). 2014.
If you are using the provided data, please cite
@InProceedings{banerjee2014_emnlp,
author = {Banerjee, Ritwik and Feng, Song and Kang, Jun Seok and Choi, Yejin},
title = {Keystroke Patterns as Prosody in Digital Writings: A Case Study with Deceptive Reviews and Essays},
booktitle = {Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {October},
year = {2014},
address = {Doha, Qatar},
publisher = {Association for Computational Linguistics},
pages = {1469--1473},
url = {http://www.aclweb.org/anthology/D14-1155}
}