Learning discriminative spatio-temporal patterns

Spot the differences between the left and right classes 


Visual categorization problems, such as object classification or action recognition, are increasingly often approached using a detection strategy: a classifier function is first applied to candidate subwindows of the image or the video, and then the maximum classifier score is used for class decision. Traditionally, the subwindow classifiers are trained on a large collection of examples manually annotated with masks or bounding boxes. The reliance on time consuming human labeling effectively limits the application of these methods to problems involving very few categories. Furthermore, the human selection of the masks introduces arbitrary biases (e.g. in terms of window size and location) which may be suboptimal for classification.

In this paper we propose a novel method for learning a discriminative subwindow classifier from examples annotated with binary labels indicating the presence of an object or action of interest, but not its location. During training, our approach simultaneously localizes the instances of the positive class and learns a subwindow SVM to recognize them. We extend our method to classification of time series by presenting an algorithm that localizes the most discriminative set of temporal segments in the signal. We evaluate our approach on several datasets for object and action recognition and show that it achieves results similar and in many cases superior to those obtained with full supervision.


Spot the differences between the left and right classes 

A unified framework for image categorization and time series classification from weakly labeled data. Our method simultaneously localizes the regions of interest in the examples and learns a region-based classifier, thus building robustness to background and uninformative signal.


Minh Hoai Nguyen, Lorenzo Torresani, Fernando de la Torre, Carsten Rother


  • Learning Discriminative Localization from Weakly Labeled Data.
    Hoai, M., Torresani, L., De la Torre, F., & Rother, C. (2014)
    Pattern Recognition , 47(3), 1523–1534. Paper BibTex.

  • Weakly supervised discriminative localization and classification: a joint learning process. Nguyen, M.H., Torresani, L., De la Torre, F., & Rother, C. (2009) Proceedings of International Conference on Computer Vision. Paper Poster BibTex.

  • Weakly supervised discriminative localization and classication: a joint learning process. Nguyen, M.H., Torresani, L., De la Torre, F., & Rother, C. (2009) Tech. report CMU-RI-TR-09-29, Robotics Institute, Carnegie Mellon University. Paper.

Acknowledgments and funding

Portions of this work were performed while Minh Hoai Nguyen and Lorenzo Torresani were at Microsoft Research Cambridge.

Copyright notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.