Kumara Kahatapitiya

I am a PhD candidate at Stony Brook University, working with Prof. Michael S. Ryoo. My primary research focus is on efficient video representations, for both video understanding and generation. I have worked on fine-grained activity localization, long-video reasoning with large multimodal models, and diffusion-based video editing/generation.

During my PhD, I was an intern at Meta GenAI, Qualcomm AI Research and Google DeepMind. Prior to this, I was a Research Assistant at University of Moratuwa, Sri-Lanka, advised by Dr. Ranga Rodrigo, where I also received my Bachelors in Electronic & Telecommunication Engineering.

[Google Scholar] [CV] [GitHub] [Twitter]
kkahatapitiy [at] cs.stonybrook.edu

I am currently on the job market for Research Scientist roles.

Recent News

[Jul 2024]	Object-Centric Diffusion for Efficient Video Editing was accepted at ECCV 2024.
[Jun 2024]	LLaRA for VLM-based policy and LVNet for Long Video Understanding are now on arXiv.
[Jun 2024]	I joined Meta GenAI as a research scientist intern.
[Mar 2024]	Language Repository and MVU for Long Video Understanding are now on arXiv.
[Feb 2024]	Video-conditioned Text Representations for activity recognition was accepted at CVPR 2024.
[Oct 2023]	Grafting Vision Transformers for multi-scale and global information sharing was accepted at WACV 2024.
[July 2023]	I joined Qualcomm AI Research, Amsterdam as a research intern.
[Apr 2023]	SWAT, a structure-aware family of token-based models was accepted at IJCAI 2023.
[Feb 2023]	Token Turing Machines for long-term memory in Transformers was accepted at CVPR 2023.
[Dec 2022]	SSDet for weakly-guided Self-supervised detection pretraining was accepted at AAAI 2023.
[Jul 2022]	StARformer with an MDP-like inductive bias for RL was accepted at ECCV 2022 and T-PAMI.
[Mar 2022]	MS-TCT for temporal action detection with CNN+Transformer embeddings was accepted at CVPR 2022.
[Feb 2022]	I joined Google Deepmind as a student researcher.
[Dec 2021]	I was a finalist (1/30) for the Adobe Research Fellowship 2022. Congratulations to all the winners!
[Dec 2021]	Swift for real-time neural video decoding was accepted at NSDI 2022.
[Sep 2021]	I am officially a PhD candidate now!
[Mar 2021]	Coarse-Fine Networks for efficient temporal activity detection was accepted at CVPR 2021.
[Jan 2021]	Exploiting Redundancy in CNNs for parameter reduction was accepted at WACV 2021.

Pre-prints

	LLaRA: Supercharging Robot Learning Data for Vision-Language Policy Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, Jinghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo arXiv 2024 [arxiv] [code]
	Too many frames, not all useful: Efficient Strategies for Long-form Video QA Jongwoo Park, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryoo, Donghyun Kim, Michael S. Ryoo arXiv 2024 [arxiv]
	Language Repository for Long Video Understanding Kumara Kahatapitiya, Kanchana Ranasinghe, Jongwoo Park, Michael S. Ryoo arXiv 2024 [arxiv] [code]
	Understanding Long Videos in One Multimodal Language Model Pass Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya, Michael S. Ryoo arXiv 2024 [project page] [arxiv] [code]

Publications

	Object-Centric Diffusion for Efficient Video Editing Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Yuki M. Asano, Fatih Porikli, Amirhossein Habibian ECCV 2024 [project page] [paper]
	VicTR: Video-conditioned Text Representations for Activity Recognition Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani, Michael S. Ryoo CVPR 2024 [paper] [poster] [talk]
	Grafting Vision Transformers Jongwoo Park, Kumara Kahatapitiya, Donghyun Kim, Shivchander Sudalairaj, Quanfu Fan, Michael S. Ryoo WACV 2024 [paper] [poster]
	SWAT: Spatial Structure Within and Among Tokens Kumara Kahatapitiya, Michael S. Ryoo IJCAI 2023 [paper] [code] [slides]
	Token Turing Machines Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab CVPR 2023 [paper] [code] [teaser]
	Weakly-guided Self-supervised Pretraining for Temporal Activity Detection Kumara Kahatapitiya, Zhou Ren, Haoxiang Li, Zhenyu Wu, Michael S. Ryoo, Gang Hua AAAI 2023 [paper] [code] [talk] [poster]
	StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning Jinghuan Shang, Kumara Kahatapitiya, Xiang Li, Michael S. Ryoo ECCV 2022, TPAMI [paper] [journal] [code] [talk] [poster]
	MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection Rui Dai, Srijan Das, Kumara Kahatapitiya, Michael S. Ryoo, Francois Bremond CVPR 2022 [paper] [code] [poster]
	Swift: Adaptive Video Streaming with Layered Neural Codecs Mallesham Dasari, Kumara Kahatapitiya, Samir Das, Aruna Balasubramanian, Dimitris Samaras NSDI 2022 [paper] [code] [slides]
	Coarse-Fine Networks for Temporal Activity Detection in Videos Kumara Kahatapitiya, Michael S. Ryoo CVPR 2021 [paper] [code] [talk] [poster]
	Exploiting the Redundancy in Convolutional Filters for Parameter Reduction Kumara Kahatapitiya, Ranga Rodrigo WACV 2021 [paper] [code] [talk]
	Feature-dependent Cross-Connections in Multi-Path Neural Networks Dumindu Tissera, Kasun Vithanage, Rukshan Wijesinghe, Kumara Kahatapitiya, Subha Fernando, Ranga Rodrigo ICPR 2020 [paper]
	Context-Aware Automatic Occlusion Removal Kumara Kahatapitiya, Dumindu Tissera, Ranga Rodrigo ICIP 2019 [paper] [code]

Other Projects

X3D-Multigrid [code]
A PyTorch implementation for "X3D: Expanding Architectures for Efficient Video Recognition models" [CVPR2020] with "A Multigrid Method for Efficiently Training Video Models" [CVPR2020]. In contrast to the original repository by FAIR, this repository provides a simpler, less modular and more familiar structure of implementation for faster and easier adoptation.
Optimal Transport in NumPy [code]
This repository contrains a few Optimal Transport Algorithms implemented using NumPy, including "A Direct O(1/epsilon) Iteration Parallel Algorithm for Optimal Transport" [NeurIPS2019], "Computational Optimal Transport: Complexity by Accelerated Gradient Descent is better than by Sinkhorn's Algorithm" [PMLR2018] and "Lightspeed Computation of Optimal Transport" [NeurIPS2013].

Teaching

CSE327: Computer Vision - TA (Spring 2020)
CSE215: Foundations of Computer Science - TA (Fall 2019)

Thanks Jon Barron for the template.