Zhibo Yang
I am a software engineer at Waymo (formerly known as the Google Self-Driving Car Project). I obtained my PhD from the Department of Computer Science, Stony Brook University, under the supervision of Dimitris Samaras. I work closely with Minh Hoai Nguyen and Gregory Zelinsky. I received my MPhil degree from the Department of Information Engineering at The Chinese University of Hong Kong under the supervision of Wing Cheong Lau and Chen Change Loy. Before that, I obtained my bachelor degree from Harbin Institute of Technology.
My research interests broadly include human vision understanding, metric learning, imitation learning and object detection. Currently, my work primarily focuses on building artificial neural network models of human vision system.
Email  / 
Google Scholar  / 
Github
|
|
News
- [03/15/24] One paper accepted to ECCV 2024.
- [03/15/24] One
paper accepted to CVPR 2024.
- [09/18/23] One
paper is accepted to Nature Communications.
- [03/01/23] One
paper is accepted to CVPR 2023.
- [09/28/22] A new preprint on ML for drug discovery is online.
- [07/03/22] One paper is accepted to ECCV 2022.
- [10/04/21] One
paper is accepted to WACV 2022.
- [09/18/21] One paper is accepted to Briefings in Bioinformatics (IF=11.6)!
- [05/05/21] I am awarded the Outstanding Reviewer of CVPR 2021!
|
Research
|
|
Look Hear: Gaze Prediction for Speech-directed Human Attention
Sounak Mondal, Seoyoung Ahn, Zhibo Yang, Niranjan Balasubramanian, Dimitris Samaras, Gregory Zelinsky, Minh Hoai
ECCV, 2024
We developed the Attention in Referral Transformer model or ART, which predicts the human fixations spurred by each word in a referring expression. To train ART, we created RefCOCO-Gaze, a large-scale dataset of 19,738 human gaze scanpaths, corresponding to 2,094 unique image-expression pairs, from 220 participants performing our referral task.
|
|
Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers
Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Ruoyu Xue, Gregory Zelinsky, Minh Hoai, Dimitris Samaras
CVPR, 2024
code / talk
Here we propose HAT, a Human Attention Transformer, that can predict both bottom-up and top-down attention control.
|
|
A Systematic Study of Key Elements Underlying Molecular Property Prediction
Jianyuan Deng, Zhibo Yang, Hehe Wang, Iwao Ojima, Dimitris Samaras, Fusheng Wang
Nature Communications, 2023
supplement
We conduct extensive experiments on several representitive models in molecular property prediction and reflect on the key aspects underlying molecular property prediction.
|
|
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
Sounak Mondal, Zhibo Yang, Seoyoung Ahn, Gregory Zelinsky, Dimitris Samaras, Minh Hoai
CVPR, 2023
We pose a new task called ZeroGaze, a new variant of zero-shot learning where gaze is predicted for never-before-searched objects, and we develop a novel model, Gazeformer, to solve the ZeroGaze problem.
|
|
Target-absent Human Attention
Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Gregory Zelinsky, Minh Hoai, Dimitris Samaras
ECCV, 2022
supplement / code / talk
We propose a data-driven computational model that addresses the search-termination problem and predicts the scanpath of search fixations made by people searching for targets that do not appear in images.
|
|
Characterizing Target-absent Human Attention
Yupei Chen, Zhibo Yang, Souradeep Chakraborty, Sounak Mondal, Seoyoung Ahn, Dimitris Samaras, Minh Hoai, Gregory Zelinsky
CVPR Workshops, 2022
supplement
We present COCO-FreeView, which complements COCO-Search18 dataset with free-viewing fixations for the same images, enabling joint analysis of search fixations and freeviewing fixations.
|
|
Hierarchical Proxy-based Loss for Deep Metric Learning
Zhibo Yang, Muhammet Bastan, Xinliang Zhu, Doug Gray, Dimitris Samaras
WACV, 2022
supplement / video / blog
We present a framework that leverages this implicit hierarchy by imposing a hierarchical structure on the proxies and can be used with any existing proxy-based loss.
|
|
Artificial Intelligence in Drug Discovery: Applications and Techniques
Jianyuan Deng, Zhibo Yang, Iwao Ojima, Dimitris Samaras, Fusheng Wang
Briefings in Bioinformatics, 2021
We conduct a comprehensive survey on AI-driven drug discovery. We also released a Github repository (link) for a collection of related papers in AI-driven Drug Discovery.
|
|
COCO-Search18 fixation dataset for predicting goal-directed attention control
Yupei Chen, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Minh Hoai, Gregory Zelinsky
Scientific Reports, 2021
dataset
We introduce COCO-Search18, the first dataset of laboratory-quality goal-directed behavior large enough to train deep-network models.
|
|
Mosaic: Advancing User Quality of Experience in 360-Degree Video Streaming with Machine Learning
Sohee Park, Arani Bhattacharya, Zhibo Yang, Samir R. Das and Dimitris Samaras,
IEEE Transactions on Network and Service Management, 2021
We develop a comprehensive approach called Mosaic that combines a powerful neural network-based viewport prediction with a rate control mechanism such that the 360-degree video quality of experience is optimized subject to a given network capacity.
|
|
Towards Better Opioid Antagonists Using Deep Reinforcement Learning
Jianyuan Deng*, Zhibo Yang*, Yao Li, Dimitris Samaras, Fusheng Wang
arXiv, 2020
We develop a deep reinforcement learning framework to discover potential lead compounds as better opioid antagonists with enhanced brain retention ability.
|
|
Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning
Zhibo Yang, Lihan Huang, Yupei Chen, Seoyoung Ahn, Zijun Wei, Gregory Zelinsky, Dimitris Samaras and Minh Hoai
CVPR (Oral), 2020, Best Paper Nomination
supplement / code / dataset / talk
We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search.
|
|
Benchmarking Gaze Prediction for Categorical Visual Search
Gregory Zelinsky, Zhibo Yang, Lihan Huang, Yupei Chen, Seoyoung Ahn, Zijun Wei, Hossein Adeli, Dimitris Samaras and Minh Hoai
CVPR Workshops (Oral), 2019
code / dataset
We present a carefully created dataset of search fixations for two target categories, microwaves and clocks, curated from the COCO2014 dataset
|
|
HiQ: Robust and Fast Decoding of High-Capacity QR Codes
Zhibo Yang, Huanle Xu, Jianyuan Deng, Chen Change Loy and Wing Cheong Lau
IEEE Transactions on Image Processing, 2018
code / dataset / video
We put forth and implement a framework for high-capacity color QR codes equipped with our methods, called HiQ. A fast color QR code decoding algorithm is also presented.
|
Services
Reviewer for CVPR, ICCV, ECCV, AAAI, WACV, BMVC, TIP, TPAMI, TNNLS; Student volunteer for ISIT15, Infocom15
|
|