Keystroke Patterns as Prosody in Digital Writings
We explore the use of keyboard strokes as a means to access the real-time writing process of online authors. Our approach draws motivation from prosody-based speech analysis and offers insight into tangible connections between typing patterns and authors' intent. In particular, we study cues of deceptive intent. Typing pattern variations such as different pause durations and diverse revisional patterns (i.e., differences in how authors edit their writing) can help distinguish between truthful and deceptive writings. Empirical results show that incorporating features based on keystroke patterns lead to improved identification of deceptive writings across two different domains: business reviews and essays.

Analogous behaviors in typing and speaking

As opposed to text, which is generated language, typing (or, for that matter, writing) is, in and of itself, a generation process — thus facilitating comparisons to cognitive processes typically associated with speech.

Simple analogies that we have exploited are typing speed, pause durations and editing maneuvers, which are analogous to speech rate, disfluency, and discourse coherence.

analogies between prosody and keystroke patterns; disfluency and coherence correspond to pause durations
           and edit maneuvers, respectively.
Deception, Cognitive load and Keystrokes

Recent research has shown that lying generally imposes a cognitive burden, which only increases in real-time scenarios. Cognitive burden, in turn, has been known to produce differences in speech, and similarly, in keystroke features. Pauses in typing, especially, have been linked to increased cognitive burden.

Keystroke logs can be utilized as an extended linguistic signal capturing the real-time writing process. Motivated by analogies to prosody, we extract features from keystroke logs to

  1. understand the deceptive intent of authors as manifested in typing behavior, and
  2. empirically improve deception detection in digital writings.
deception increases cognitive load, which in turn has a tangible affect on keystroke patterns

An analysis of how much time writers spend on different types of words reveals more about the relation between deception and cognitive processes. The top 20 words with the greatest time-span difference between truthful and deceptive reviews are shown here:

More time spent in
truthful reviews deceptive reviews
word cloud of words taking more time in truthful reviews word cloud of words taking more time in deceptive reviews
Classification Accuracy: adding keystroke features leads to improvement over a strong baseline.
classification accuracy bar chart

The dataset contains truthful and deceptive writings from two domains: business reviews and essays on two topics of social interest: (a) gun control and (b) gay marriage. The data is available for download as compressed tar.bz2 files:

The uncompressed dataset consists of files with tab-separated values. The key log data is found in the last column, titled ReviewMeta. This field has a list of KeyUp, KeyDown and MouseUp event logs. Note that the first event timestamp is not always zero. The event logs have the following formats:
[timestamp] KeyUp/KeyDown [javascript keycode]
[timestamp] MouseUp [begin-index] [end-index]


Keystroke Patterns as Prosody in Digital Writings: A Case Study with Deceptive Reviews and Essays
Ritwik Banerjee, Song Feng, Jun S. Kang, Yejin Choi
Empirical Methods on Natural Language Processing (EMNLP). 2014.

If you are using the provided data, please cite

