Studying Relationships Between
Human Gaze, Description, and Computer Vision

Presented at Computer Vision and Pattern Recognition (CVPR) 2013
Kiwon Yun, Yifan Peng, Dimitris Samaras, Gregory J. Zelinsky, Tamara L. Berg
Stony Brook University

Abstract

We posit that user behavior during natural viewing of images contains an abundance of information about the content of images as well as information related to user intent and user defined content importance. In this paper, we conduct experiments to better understand the relationship between images, the eye movements people make while viewing images, and how people construct natural language to describe images. We explore these relationships in the context of two commonly used computer vision datasets. We then further relate human cues with outputs of current visual recognition systems and demonstrate prototype applications for gaze-enabled detection and annotation.

Introduction

User behavior while freely viewing images contains an abundance of information about uer intent and depicted scene content.

Humans can provide:

  • Passive indications of content through gaze patterns. These cues provide estimates about “where” important things are, but not “what” they are.
  • Active indications of content through descriptions. These cues can directly inform questions of “what” is in an image as well as indicating which parts of the content are important to the viewer.

Computer vision recognition algorithms can provide:

  • Automatic indications of content from recognition algorithms. These algorithms can inform estimates of “what” might be “where” in visual imagery, but will always be noisy predictions and have no knowledge of relative content importance.

We conduct several experiments to better understnad the relationship between gaze, description, and image content. From these exploratory analyses, we build prototype applications for gaze-enabled object detection and annotation.

Gaze-Enabled Computer Vision

Publications

  • Kiwon Yun, Yifan Peng, Dimitris Samaras, Gregory J. Zelinsky, and Tamara L. Berg, Studying Relationships Between Human Gaze, Description, and Computer Vision, Computer Vision and Pattern Recognition (CVPR) 2013 (Oregon/USA)
  • Kiwon Yun, Yifan Peng, Dimitris Samaras, Gregory J. Zelinsky and Tamara L. Berg, Exploring the Role of Gaze Behavior and Object Detection in Scene Understanding, Frontiers in Psychology, December 2013, 4(917): 1-14
  • Kiwon Yun, Yifan Peng, Hossein Adeli, Tamara L. Berg, Dimitris Samaras, and Gregory J. Zelinsky, Specifying the Relationships Between Objects, Gaze, and Descriptions for Scene Understanding, Visual Science Society (VSS) 2013 (Florida/USA)

Download

SBU Gaze-Detection-Description Dataset

Acknowledgements

This work was supported in part by NSF Awards IIS-1161876, IIS-1054133, IIS-1111047, IIS- 0959979 and the SUBSAMPLE Project of the DIGITEO Institute, France. We thank J. Maxfield, Hossein Adeli and J. Weiss for data pre-processing and useful discussions.