Zhibo Yang

I am a software engineer at Waymo (formerly known as the Google Self-Driving Car Project). I obtained my PhD from the Department of Computer Science, Stony Brook University, under the supervision of Dimitris Samaras. I work closely with Minh Hoai Nguyen and Gregory Zelinsky. I received my MPhil degree from the Department of Information Engineering at The Chinese University of Hong Kong under the supervision of Wing Cheong Lau and Chen Change Loy. Before that, I obtained my bachelor degree from Harbin Institute of Technology.

My research interests broadly include human vision understanding, metric learning, imitation learning and object detection. Currently, my work primarily focuses on building artificial neural network models of human vision system.

Email / Google Scholar / Github

News

[03/15/24] One paper accepted to ECCV 2024.
[03/15/24] One paper accepted to CVPR 2024.
[09/18/23] One paper is accepted to Nature Communications.
[03/01/23] One paper is accepted to CVPR 2023.
[09/28/22] A new preprint on ML for drug discovery is online.
[07/03/22] One paper is accepted to ECCV 2022.
[10/04/21] One paper is accepted to WACV 2022.
[09/18/21] One paper is accepted to Briefings in Bioinformatics (IF=11.6)!
[05/05/21] I am awarded the Outstanding Reviewer of CVPR 2021!

Research
	Look Hear: Gaze Prediction for Speech-directed Human Attention Sounak Mondal, Seoyoung Ahn, Zhibo Yang, Niranjan Balasubramanian, Dimitris Samaras, Gregory Zelinsky, Minh Hoai ECCV, 2024 We developed the Attention in Referral Transformer model or ART, which predicts the human fixations spurred by each word in a referring expression. To train ART, we created RefCOCO-Gaze, a large-scale dataset of 19,738 human gaze scanpaths, corresponding to 2,094 unique image-expression pairs, from 220 participants performing our referral task.
	Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Ruoyu Xue, Gregory Zelinsky, Minh Hoai, Dimitris Samaras CVPR, 2024 code / talk Here we propose HAT, a Human Attention Transformer, that can predict both bottom-up and top-down attention control.
	A Systematic Study of Key Elements Underlying Molecular Property Prediction Jianyuan Deng, Zhibo Yang, Hehe Wang, Iwao Ojima, Dimitris Samaras, Fusheng Wang Nature Communications, 2023 supplement We conduct extensive experiments on several representitive models in molecular property prediction and reflect on the key aspects underlying molecular property prediction.
	Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention Sounak Mondal, Zhibo Yang, Seoyoung Ahn, Gregory Zelinsky, Dimitris Samaras, Minh Hoai CVPR, 2023 We pose a new task called ZeroGaze, a new variant of zero-shot learning where gaze is predicted for never-before-searched objects, and we develop a novel model, Gazeformer, to solve the ZeroGaze problem.
	Target-absent Human Attention Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Gregory Zelinsky, Minh Hoai, Dimitris Samaras ECCV, 2022 supplement / code / talk We propose a data-driven computational model that addresses the search-termination problem and predicts the scanpath of search fixations made by people searching for targets that do not appear in images.
	Characterizing Target-absent Human Attention Yupei Chen, Zhibo Yang, Souradeep Chakraborty, Sounak Mondal, Seoyoung Ahn, Dimitris Samaras, Minh Hoai, Gregory Zelinsky CVPR Workshops, 2022 supplement We present COCO-FreeView, which complements COCO-Search18 dataset with free-viewing fixations for the same images, enabling joint analysis of search fixations and freeviewing fixations.
	Hierarchical Proxy-based Loss for Deep Metric Learning Zhibo Yang, Muhammet Bastan, Xinliang Zhu, Doug Gray, Dimitris Samaras WACV, 2022 supplement / video / blog We present a framework that leverages this implicit hierarchy by imposing a hierarchical structure on the proxies and can be used with any existing proxy-based loss.
	Artificial Intelligence in Drug Discovery: Applications and Techniques Jianyuan Deng, Zhibo Yang, Iwao Ojima, Dimitris Samaras, Fusheng Wang Briefings in Bioinformatics, 2021 We conduct a comprehensive survey on AI-driven drug discovery. We also released a Github repository (link) for a collection of related papers in AI-driven Drug Discovery.
	COCO-Search18 fixation dataset for predicting goal-directed attention control Yupei Chen, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Minh Hoai, Gregory Zelinsky Scientific Reports, 2021 dataset We introduce COCO-Search18, the first dataset of laboratory-quality goal-directed behavior large enough to train deep-network models.
	Mosaic: Advancing User Quality of Experience in 360-Degree Video Streaming with Machine Learning Sohee Park, Arani Bhattacharya, Zhibo Yang, Samir R. Das and Dimitris Samaras, IEEE Transactions on Network and Service Management, 2021 We develop a comprehensive approach called Mosaic that combines a powerful neural network-based viewport prediction with a rate control mechanism such that the 360-degree video quality of experience is optimized subject to a given network capacity.
	Towards Better Opioid Antagonists Using Deep Reinforcement Learning Jianyuan Deng, Zhibo Yang*, Yao Li, Dimitris Samaras, Fusheng Wang arXiv*, 2020 We develop a deep reinforcement learning framework to discover potential lead compounds as better opioid antagonists with enhanced brain retention ability.
	Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning Zhibo Yang, Lihan Huang, Yupei Chen, Seoyoung Ahn, Zijun Wei, Gregory Zelinsky, Dimitris Samaras and Minh Hoai CVPR (Oral), 2020, Best Paper Nomination supplement / code / dataset / talk We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search.
	Benchmarking Gaze Prediction for Categorical Visual Search Gregory Zelinsky, Zhibo Yang, Lihan Huang, Yupei Chen, Seoyoung Ahn, Zijun Wei, Hossein Adeli, Dimitris Samaras and Minh Hoai CVPR Workshops (Oral), 2019 code / dataset We present a carefully created dataset of search fixations for two target categories, microwaves and clocks, curated from the COCO2014 dataset
	HiQ: Robust and Fast Decoding of High-Capacity QR Codes Zhibo Yang, Huanle Xu, Jianyuan Deng, Chen Change Loy and Wing Cheong Lau IEEE Transactions on Image Processing, 2018 code / dataset / video We put forth and implement a framework for high-capacity color QR codes equipped with our methods, called HiQ. A fast color QR code decoding algorithm is also presented.

Services

Reviewer for CVPR, ICCV, ECCV, AAAI, WACV, BMVC, TIP, TPAMI, TNNLS; Student volunteer for ISIT15, Infocom15

Design and code stolen from Jon Barron.