Kumara Kahatapitiya

I am a Research Scientist at Meta GenAI. I completed my PhD at Stony Brook University, working on efficient representations for video understanding and generation, advised by Prof. Michael S. Ryoo.

During my PhD, I interned at Google DeepMind, Qualcomm AI Research, and Meta GenAI. Prior to this, I was a Research Assistant at University of Moratuwa, Sri-Lanka, advised by Dr. Ranga Rodrigo, where I also received my Bachelors in Electronic & Telecommunication Engineering.

[Google Scholar] [GitHub] [Twitter]
kkahatapitiy [at] cs.stonybrook.edu

Recent News

[May 2025]	MarDini was accepted at TMLR and LangRepo was accepted at ACL 2025.
[Jan 2025]	LLaRA and MVU were accepted at ICLR 2025.
[Nov 2024]	AdaCache for accelerating Video Generation was released on arXiv.
[Oct 2024]	Early versions of LangRepo and LVNet were accepted at NeurIPS 2024 workshop on Video-Language Models.
[Jul 2024]	Object-Centric Diffusion for Efficient Video Editing was accepted at ECCV 2024.
[Feb 2024]	Video-conditioned Text Representations for activity recognition was accepted at CVPR 2024.
[Oct 2023]	Grafting Vision Transformers for multi-scale and global information sharing was accepted at WACV 2024.
[Apr 2023]	SWAT, a structure-aware family of token-based models was accepted at IJCAI 2023.
[Feb 2023]	Token Turing Machines for long-term memory in Transformers was accepted at CVPR 2023.
[Dec 2022]	SSDet for weakly-guided Self-supervised detection pretraining was accepted at AAAI 2023.
[Jul 2022]	StARformer with an MDP-like inductive bias for RL was accepted at ECCV 2022 and T-PAMI.
[Mar 2022]	MS-TCT for temporal action detection with CNN+Transformer embeddings was accepted at CVPR 2022.

Preprints

Adaptive Caching for Faster Video Generation with Diffusion Transformers
Kumara Kahatapitiya, Haozhe Liu, Sen He, Ding Liu, Menglin Jia, Chenyang Zhang, Michael S. Ryoo, Tian Xie
arXiv 2024
[project page] [preprint] [code]

Selected publications

	MarDini: Masked Auto-Regressive Diffusion for Video Generation at Scale Haozhe Liu, Shikun Liu, Zijian Zhou, Mengmeng Xu, Yanping Xie, Xiao Han, Juan C. Pérez, Ding Liu, Kumara Kahatapitiya, Menglin Jia, Jui-Chieh Wu, Sen He, Tao Xiang, Jürgen Schmidhuber, Juan-Manuel Pérez-Rúa TMLR [project page] [paper]
	Language Repository for Long Video Understanding Kumara Kahatapitiya, Kanchana Ranasinghe, Jongwoo Park, Michael S. Ryoo ACL 2025 Findings [paper] [code] [webinar]
	LLaRA: Supercharging Robot Learning Data for Vision-Language Policy Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, Jinghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo ICLR 2025 [paper] [code]
	Understanding Long Videos with Multimodal Language Models Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya, Michael S. Ryoo ICLR 2025 [project page] [paper] [code] [webinar]
	Too many frames, not all useful: Efficient Strategies for Long-form Video QA Jongwoo Park, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryoo, Donghyun Kim, Michael S. Ryoo NeurIPS 2024 Workshops [paper] [code] [webinar]
	Object-Centric Diffusion for Efficient Video Editing Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Yuki M. Asano, Fatih Porikli, Amirhossein Habibian ECCV 2024 [project page] [paper] [poster] [talk]
	VicTR: Video-conditioned Text Representations for Activity Recognition Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani, Michael S. Ryoo CVPR 2024 [paper] [poster] [talk]
	Grafting Vision Transformers Jongwoo Park, Kumara Kahatapitiya, Donghyun Kim, Shivchander Sudalairaj, Quanfu Fan, Michael S. Ryoo WACV 2024 [paper] [poster]
	SWAT: Spatial Structure Within and Among Tokens Kumara Kahatapitiya, Michael S. Ryoo IJCAI 2023 [paper] [code] [slides]
	Token Turing Machines Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab CVPR 2023 [paper] [code] [teaser]
	Weakly-guided Self-supervised Pretraining for Temporal Activity Detection Kumara Kahatapitiya, Zhou Ren, Haoxiang Li, Zhenyu Wu, Michael S. Ryoo, Gang Hua AAAI 2023 [paper] [code] [talk] [poster]
	StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning Jinghuan Shang, Kumara Kahatapitiya, Xiang Li, Michael S. Ryoo ECCV 2022, TPAMI [paper] [journal] [code] [talk] [poster]
	MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection Rui Dai, Srijan Das, Kumara Kahatapitiya, Michael S. Ryoo, Francois Bremond CVPR 2022 [paper] [code] [poster]
	Swift: Adaptive Video Streaming with Layered Neural Codecs Mallesham Dasari, Kumara Kahatapitiya, Samir Das, Aruna Balasubramanian, Dimitris Samaras NSDI 2022 [paper] [code] [slides]
	Coarse-Fine Networks for Temporal Activity Detection in Videos Kumara Kahatapitiya, Michael S. Ryoo CVPR 2021 [paper] [code] [talk] [poster]
	Exploiting the Redundancy in Convolutional Filters for Parameter Reduction Kumara Kahatapitiya, Ranga Rodrigo WACV 2021 [paper] [code] [talk]

Other Projects

X3D-Multigrid [code]
A PyTorch implementation for "X3D: Expanding Architectures for Efficient Video Recognition models" [CVPR2020] with "A Multigrid Method for Efficiently Training Video Models" [CVPR2020]. In contrast to the original repository by FAIR, this repository provides a simpler, less modular and more familiar structure of implementation for faster and easier adoptation.
Optimal Transport in NumPy [code]
This repository contrains a few Optimal Transport Algorithms implemented using NumPy, including "A Direct O(1/epsilon) Iteration Parallel Algorithm for Optimal Transport" [NeurIPS2019], "Computational Optimal Transport: Complexity by Accelerated Gradient Descent is better than by Sinkhorn's Algorithm" [PMLR2018] and "Lightspeed Computation of Optimal Transport" [NeurIPS2013].

Teaching

CSE327: Computer Vision - TA (Spring 2020)
CSE215: Foundations of Computer Science - TA (Fall 2019)

Thanks Jon Barron for the template.