Kumara Kahatapitiya
I am a PhD candidate at Stony Brook University, working with
Prof. Michael S. Ryoo.
My primary research focus is on efficient video representations, for both video understanding and generation. I have worked on
fine-grained activity localization, long-video reasoning with large multimodal models, and diffusion-based video editing/generation.
During my PhD, I was an intern at Meta GenAI, Qualcomm AI Research and
Google DeepMind.
Prior to this, I was a Research Assistant at University of Moratuwa, Sri-Lanka, advised
by Dr. Ranga Rodrigo, where I also received my Bachelors in Electronic & Telecommunication Engineering.
[Google Scholar]   
[CV]   
[GitHub]   
[Twitter]
kkahatapitiy [at] cs.stonybrook.edu
I am currently on the job market for Research Scientist roles.
|
|
[Nov 2024] |
AdaCache for speeding-up video DiTs and MarDini for video generation with AR-Diffusion are now on arXiv.
|
[Oct 2024] |
Early versions of LangRepo and LVNet will appear at NeurIPS 2024 workshop on Video-Language Models.
|
[Oct 2024] |
An early version of LLaRA will appear at CoRL 2024 workshop on Language and Robot Learning.
|
[Jul 2024] |
Object-Centric Diffusion for Efficient Video Editing was accepted at ECCV 2024.
|
[Jun 2024] |
I joined Meta GenAI as a research scientist intern.
|
[Mar 2024] |
MVU for Long Video Understanding are now on arXiv.
|
[Feb 2024] |
Video-conditioned Text Representations for activity recognition was accepted at CVPR 2024.
|
[Oct 2023] |
Grafting Vision Transformers for multi-scale and global information sharing was accepted at WACV 2024.
|
[July 2023] |
I joined Qualcomm AI Research, Amsterdam as a research intern.
|
[Apr 2023] |
SWAT, a structure-aware family of token-based models was accepted at IJCAI 2023.
|
[Feb 2023] |
Token Turing Machines for long-term memory in Transformers was accepted at CVPR 2023.
|
[Dec 2022] |
SSDet for weakly-guided Self-supervised detection pretraining was accepted at AAAI 2023.
|
[Jul 2022] |
StARformer with an MDP-like inductive bias for RL was accepted at ECCV 2022 and T-PAMI.
|
[Mar 2022] |
MS-TCT for temporal action detection with CNN+Transformer embeddings was accepted at CVPR 2022.
|
[Feb 2022] |
I joined Google Deepmind as a student researcher.
|
|
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Kumara Kahatapitiya,
Haozhe Liu,
Sen He,
Ding Liu,
Menglin Jia,
Chenyang Zhang,
Michael S. Ryoo,
Tian Xie
arXiv 2024
[project page]
[preprint]
[code]
|
|
MarDini: Masked Auto-Regressive Diffusion for Video Generation at Scale
Haozhe Liu,
Shikun Liu,
Zijian Zhou,
Mengmeng Xu,
Yanping Xie,
Xiao Han,
Juan C. Pérez,
Ding Liu,
Kumara Kahatapitiya,
Menglin Jia,
Jui-Chieh Wu,
Sen He,
Tao Xiang,
Jürgen Schmidhuber,
Juan-Manuel Pérez-Rúa
arXiv 2024
[project page]
[preprint]
|
|
Understanding Long Videos in One Multimodal Language Model Pass
Kanchana Ranasinghe,
Xiang Li,
Kumara Kahatapitiya,
Michael S. Ryoo
arXiv 2024
[project page]
[preprint]
[code]
[webinar]
|
|
Language Repository for Long Video Understanding
Kumara Kahatapitiya,
Kanchana Ranasinghe,
Jongwoo Park,
Michael S. Ryoo
NeurIPS 2024 workshops
[paper]
[code]
[webinar]
|
|
Too many frames, not all useful: Efficient Strategies for Long-form Video QA
Jongwoo Park,
Kanchana Ranasinghe,
Kumara Kahatapitiya,
Wonjeong Ryoo,
Donghyun Kim,
Michael S. Ryoo
NeurIPS 2024 workshops
[paper]
[code]
[webinar]
|
|
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Xiang Li,
Cristina Mata,
Jongwoo Park,
Kumara Kahatapitiya,
Yoo Sung Jang,
Jinghuan Shang,
Kanchana Ranasinghe,
Ryan Burgert,
Mu Cai,
Yong Jae Lee,
Michael S. Ryoo
CoRL 2024 workshops
[paper]
[code]
|
|
Object-Centric Diffusion for Efficient Video Editing
Kumara Kahatapitiya,
Adil Karjauv,
Davide Abati,
Yuki M. Asano,
Fatih Porikli,
Amirhossein Habibian
ECCV 2024
[project page]
[paper]
[poster]
[talk]
|
|
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya,
Anurag Arnab,
Arsha Nagrani,
Michael S. Ryoo
CVPR 2024
[paper]
[poster]
[talk]
|
|
Grafting Vision Transformers
Jongwoo Park,
Kumara Kahatapitiya,
Donghyun Kim,
Shivchander Sudalairaj,
Quanfu Fan,
Michael S. Ryoo
WACV 2024
[paper]
[poster]
|
|
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya,
Michael S. Ryoo
IJCAI 2023
[paper]
[code]
[slides]
|
|
Token Turing Machines
Michael S. Ryoo,
Keerthana Gopalakrishnan,
Kumara Kahatapitiya,
Ted Xiao,
Kanishka Rao,
Austin Stone,
Yao Lu,
Julian Ibarz,
Anurag Arnab
CVPR 2023
[paper]
[code]
[teaser]
|
|
Weakly-guided Self-supervised Pretraining for Temporal Activity Detection
Kumara Kahatapitiya,
Zhou Ren,
Haoxiang Li,
Zhenyu Wu,
Michael S. Ryoo,
Gang Hua
AAAI 2023
[paper]
[code]
[talk]
[poster]
|
|
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning
Jinghuan Shang,
Kumara Kahatapitiya,
Xiang Li,
Michael S. Ryoo
ECCV 2022, TPAMI
[paper]
[journal]
[code]
[talk]
[poster]
|
|
MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection
Rui Dai,
Srijan Das,
Kumara Kahatapitiya,
Michael S. Ryoo,
Francois Bremond
CVPR 2022
[paper]
[code]
[poster]
|
|
Swift: Adaptive Video Streaming with Layered Neural Codecs
Mallesham Dasari,
Kumara Kahatapitiya,
Samir Das,
Aruna Balasubramanian,
Dimitris Samaras
NSDI 2022
[paper]
[code]
[slides]
|
|
Coarse-Fine Networks for Temporal Activity Detection in Videos
Kumara Kahatapitiya,
Michael S. Ryoo
CVPR 2021
[paper]
[code]
[talk]
[poster]
|
|
Exploiting the Redundancy in Convolutional Filters for Parameter Reduction
Kumara Kahatapitiya,
Ranga Rodrigo
WACV 2021
[paper]
[code]
[talk]
|
Other Projects
- X3D-Multigrid [code]
A PyTorch implementation for "X3D: Expanding Architectures for Efficient Video Recognition models" [CVPR2020] with
"A Multigrid Method for Efficiently Training Video Models" [CVPR2020]. In contrast to the original repository by FAIR,
this repository provides a simpler, less modular and more familiar structure of implementation for faster and easier adoptation.
- Optimal Transport in NumPy [code]
This repository contrains a few Optimal Transport Algorithms implemented using NumPy, including
"A Direct O(1/epsilon) Iteration Parallel Algorithm for Optimal Transport" [NeurIPS2019],
"Computational Optimal Transport: Complexity by Accelerated Gradient Descent is better than by Sinkhorn's Algorithm" [PMLR2018] and
"Lightspeed Computation of Optimal Transport" [NeurIPS2013].
|
|