Pulling Actions Out of Context


This project investigates problems of human action recognition in video. A human action does not occur in isolation, and it is not the only thing recorded in a video sequence. A video clip of a human action also contains many other components, including the background scene, the interacting objects, the camera motion, and the activity of other people. Some of these components are contextual elements that frequently co-occur with the category of action in consideration. The project develops technologies that separated human actions from co-occurring factors for large-scale recognition and fine-grain visual interpretation of human actions. The developed technologies can have many practical applications in a wide range of fields, ranging from human computer interaction and robotics to security and health-care.

This research develops an approach to human action recognition by explicitly factorizing human actions from context. The key idea is to exploit the benefits of the information from conjugate samples of human actions. A conjugate sample is defined as a video clip that is contextually similar to an action sample, but does not contain the action. For instance, a conjugate sample of a handshake sample can be the video sequence showing two people approaching each other prior to the handshake. The handshake clip and the video sequence preceding it have many similar or even the same contextual elements, including the people, the background scene, the camera angle, and the lighting condition. The only thing that sets these two video clips apart is the actual human action itself. A conjugate sample provides complementary information to the action sample; it can be used to suppress contextual irrelevance and magnify the action signal. The specific research objectives of this project include: (1) collecting human action samples for many action classes; (2) developing algorithms to mine and extract conjugate human action samples; and (3) developing a framework that utilizes the benefits of conjugate samples for separating actions from context to learn classifiers for large-scale recognition and fine-grain understanding of human actions.



  • Pulling Actions out of Context: Explicit Separation for Effective Combination. Wang, Y., & Hoai, M. (2018) Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). BibTex.

  • Eigen-Evolution Dense Trajectory Descriptors. Wang, Y., Tran, V., & Hoai, M. (2018)
    Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition. Paper BibTex.

  • Improving Human Action Recognition by Non-action Classification. Wang, Y., & Hoai, M. (2016)
    Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Paper BibTex.

Funding Sources

  • Nation Science Foundation Award No. 1566248. CRII: RI: Towards Large-Scale Recognition and Fine-Grain Analysis of Human Actions: Pulling Actions Out of Context

  • Google Research Awards

Copyright notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.