Kumara Kahatapitiya

I am a PhD candidate in Computer Science at Stony Brook University, working with Prof. Michael S. Ryoo. My primary research focus is on video understanding. More recently, I also started working on video-language embeddings and token-based models in vision.

During my PhD, I was a student researcher at Google Brain, and an intern at Wormpex AI Research. Prior to this, I was a Research Assistant at University of Moratuwa, Sri-Lanka, advised by Dr. Ranga Rodrigo, where I also received my Bachelors in Electronic & Telecommunication Engineering.

[Google Scholar]    [GitHub]    [Twitter]

profile photo
Recent News
[Feb 2023] Token Turing Machines for long-term memory in Transformers was accepted at CVPR 2023.
[Dec 2022] Weakly-guided Self-supervised detection pretraining was accepted at AAAI 2023.
[Nov 2022] Grafting Vision Transformers, an efficient add-on for multi-scale and global information sharing is now on arXiv.
[Sep 2022] StARformer extended to real-world robot environments was accepted at T-PAMI.
[Jul 2022] StARformer with an MDP-like inductive bias for RL was accepted at ECCV 2022.
[Mar 2022] MS-TCT for temporal action detection with CNN+Transformer embeddings was accepted at CVPR 2022.
[Feb 2022] I joined Robotics at Google as a Student Researcher.
[Dec 2021] I was a finalist (1/30) for the Adobe Research Fellowship 2022. Congratulations to all the winners!
[Dec 2021] Swift for real-time neural video decoding was accepted at NSDI 2022.
[Nov 2021] SWAT as a structure-aware, token-based family of models is now on arXiv.
[Sep 2021] I am officially a PhD candidate now!
[Mar 2021] Coarse-Fine Networks for efficient temporal activity detection was accepted at CVPR 2021.
[Jan 2021] Exploiting Redundancy in CNNs for parameter reduction was accepted at WACV 2021.
Selected Publications
Token Turing Machines
Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab
CVPR 2023
[paper] [code]

Weakly-guided Self-supervised Pretraining for Temporal Activity Detection
Kumara Kahatapitiya, Zhou Ren, Haoxiang Li, Zhenyu Wu, Michael S. Ryoo, Gang Hua
AAAI 2023
[paper] [code] [talk] [poster]

Grafting Vision Transformers
Jongwoo Park, Kumara Kahatapitiya, Donghyun Kim, Shivchander Sudalairaj, Quanfu Fan, Michael S. Ryoo
arXiv 2022

StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning
Jinghuan Shang, Kumara Kahatapitiya, Xiang Li, Michael S. Ryoo
[paper] [journal] [code] [talk] [poster]

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
Rui Dai, Srijan Das, Kumara Kahatapitiya, Michael S. Ryoo, Francois Bremond
CVPR 2022
[paper] [code] [poster]
Swift: Adaptive Video Streaming with Layered Neural Codecs
Mallesham Dasari, Kumara Kahatapitiya, Samir Das, Aruna Balasubramanian, Dimitris Samaras
NSDI 2022
[paper] [code] [slides]
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya, Michael S. Ryoo
arXiv 2021

Coarse-Fine Networks for Temporal Activity Detection in Videos
Kumara Kahatapitiya, Michael S. Ryoo
CVPR 2021
[paper] [code] [talk] [poster]

Exploiting the Redundancy in Convolutional Filters for Parameter Reduction
Kumara Kahatapitiya, Ranga Rodrigo
WACV 2021
[paper] [code] [talk]

Feature-dependent Cross-Connections in Multi-Path Neural Networks
Dumindu Tissera, Kasun Vithanage, Rukshan Wijesinghe, Kumara Kahatapitiya, Subha Fernando, Ranga Rodrigo
ICPR 2020

Context-Aware Automatic Occlusion Removal
Kumara Kahatapitiya, Dumindu Tissera, Ranga Rodrigo
ICIP 2019
[paper] [code]

Open-source Projects

  • X3D-Multigrid [code]
    A PyTorch implementation for "X3D: Expanding Architectures for Efficient Video Recognition models" [CVPR2020] with "A Multigrid Method for Efficiently Training Video Models" [CVPR2020]. In contrast to the original repository by FAIR, this repository provides a simpler, less modular and more familiar structure of implementation for faster and easier adoptation.
  • Optimal Transport in NumPy [code]
    This repository contrains a few Optimal Transport Algorithms implemented using NumPy, including "A Direct O(1/epsilon) Iteration Parallel Algorithm for Optimal Transport" [NeurIPS2019], "Computational Optimal Transport: Complexity by Accelerated Gradient Descent is better than by Sinkhorn's Algorithm" [PMLR2018] and "Lightspeed Computation of Optimal Transport" [NeurIPS2013].

Website template is here. Thanks!