Keynote Speakers
Cornelia Caragea,
NSF and UIC, USA
Alexander “Sasha” Rush,
Cornell Tech, USA
Haixun Wang,
Instacart, USA
Improving Semi-Supervised Learning with Pseudo-Margins
NSF and UIC, USA
Abstract
In this talk, I will discuss a new semi-supervised learning approach that combines consistency regularization and
pseudo-labeling, with its main novelty arising from the use of unlabeled data training dynamics to measure pseudo-label
quality. Instead of using only the model's confidence on an unlabeled example at an arbitrary iteration to decide if the
example should be included in the training or not, our approach also analyzes the behavior of the model on the
pseudo-labeled examples as the training progresses to ensure low quality predictions are masked out. I will show that
our approach brings substantial improvements on diverse text and vision benchmarks, emphasizing the importance of
enforcing high quality pseudo-labels.
Cornelia Caragea is a Professor of Computer Science and the Director of the Information Retrieval Research Laboratory at the University of Illinois Chicago (UIC). Caragea currently serves as Program Director at the National Science Foundation. Her research interests are in natural language processing, artificial intelligence, deep learning, machine learning, and information retrieval. Caragea's work has been recognized with several National Science Foundation (NSF) research awards, including the prestigious NSF CAREER award. She has published many research papers in top venues such as ACL, EMNLP, NAACL, ICML, AAAI, and IJCAI and was a program committee member for many such conferences. She reviewed for many journals including Nature, ACM TIST, JAIR, and TACL, served on many NSF review panels, and organized several workshops on scholarly big data. In 2020-21, she received the College of Engineering (COE) Research Award, which is awarded to faculty in the College of Engineering at UIC for excellent research contributions. Caragea was included on an Elsevier list of the top 2% of scientists in their fields for her single-year impact in 2020.
Generative Information Retrieval and E-commerce
Haixun Wang, Instacart, USA
Abstract
Information Retrieval and E-Commerce are seeing a great opportunity in the generative AI age. The challenge of
maintaining relevance across a vast array of queries, especially new and obscure ones, has persisted. Despite
significant investments by leaders like Amazon, e-commerce search still lags behind web search advancements by Google
and Bing. Recent breakthroughs in Large Language Models (LLMs) and generative AI have renewed my optimism. These
technologies decode user intent more accurately and handle nuanced queries, transforming e-commerce search into an
intuitive, natural language-like experience. The advent of multimodal foundation models promises richer, personalized
experiences by integrating text, images, videos, and voice. This shift moves e-commerce from static solutions to
dynamic, responsive environments tailored to individual preferences. Additionally, the rise of intelligent agents and
the fusion of physical and digital shopping through innovations like drone delivery and AR/VR will profoundly impact the
industry. As we stand on the brink of this transformation, the key question is who will lead this revolution: tech
giants, specialized vertical vendors, or existing e-commerce companies? In this talk, I will explore the current
landscape of e-commerce search and outline a future where it exceeds customer expectations.
Haixun Wang is currently an IEEE fellow, editor in chief of IEEE Data Engineering Bulletin, and a VP of Engineering and Distinguished Scientist at Instacart. Before Instacart, he was a VP of Engineering and Distinguished Scientist at WeWork, a Director of Natural Language Processing at Amazon, and he led the NLP team working on Query and Document Understanding at Facebook. From 2013 to 2015, he was with Google Research working on natural language processing. From 2009 to 2013, he led research in semantic search, graph data processing systems, and distributed query processing at Microsoft Research Asia. He had been a research staff member at IBM T. J. Watson Research Center from 2000 to 2009. He received the Ph.D. degree in Computer Science from the University of California, Los Angeles in 2000. He has published more than 200 research papers in international journals and conference proceedings. He serves as a trustee of the VLDB Endowment and has held roles such as PC Chair for conferences like SIGKDD and CIKM, as well as editorial board member for journals like IEEE Transactions on Knowledge and Data Engineering (TKDE). He won the 10-year ICDE influential paper award in 2024, ICDE best paper award in 2015, ICDM10-year best paper award in 2013, and the best paper award of ER 2009.