Keynote Speakers


Cornelia Caragea,

NSF and UIC, USA

Alexander "Sasha" Rush,

Cornell Tech, USA

Haixun Wang,

Instacart, USA

Improving Semi-Supervised Learning with Pseudo-Margins

NSF and UIC, USA

Abstract
In this talk, I will discuss a new semi-supervised learning approach that combines consistency regularization and pseudo-labeling, with its main novelty arising from the use of unlabeled data training dynamics to measure pseudo-label quality. Instead of using only the model's confidence on an unlabeled example at an arbitrary iteration to decide if the example should be included in the training or not, our approach also analyzes the behavior of the model on the pseudo-labeled examples as the training progresses to ensure low quality predictions are masked out. I will show that our approach brings substantial improvements on diverse text and vision benchmarks, emphasizing the importance of enforcing high quality pseudo-labels.

Cornelia Caragea is a Professor of Computer Science and the Director of the Information Retrieval Research Laboratory at the University of Illinois Chicago (UIC). Caragea currently serves as Program Director at the National Science Foundation. Her research interests are in natural language processing, artificial intelligence, deep learning, machine learning, and information retrieval. Caragea's work has been recognized with several National Science Foundation (NSF) research awards, including the prestigious NSF CAREER award. She has published many research papers in top venues such as ACL, EMNLP, NAACL, ICML, AAAI, and IJCAI and was a program committee member for many such conferences. She reviewed for many journals including Nature, ACM TIST, JAIR, and TACL, served on many NSF review panels, and organized several workshops on scholarly big data. In 2020-21, she received the College of Engineering (COE) Research Award, which is awarded to faculty in the College of Engineering at UIC for excellent research contributions. Caragea was included on an Elsevier list of the top 2% of scientists in their fields for her single-year impact in 2020.

Designing Text Embeddings for the Future

Cornell Tech, USA

Abstract
Embeddings are now a standard method in which institutions manage, retrieve, and transform their unstructured text data. Although in their current form they are clearly useful, the management capability of embeddings is quite limited. Unlike standard data systems, we do not have methods for inspecting their contents, contextualizing them with additional information, or migrating them between disparate systems. If embeddings are to be a first-class citizen of the modern stack, e.g. the "files" of AI data systems, we will need to have more flexibility in their capabilities. In this talk we present recent AI research into improved text embeddings to target real-world usage, and discuss future challenges for embeddings in big data systems.

Alexander "Sasha" Rush is an Associate Professor at Cornell Tech and a researcher at Hugging Face. His research interest is in the study of language models with applications in controllable text generation, efficient inference, and novel ML architectures. In addition to research, he has written several popular open-source software projects supporting NLP research, programming for deep learning, and virtual academic conferences. He is a co-founder of COLM, the conference on language modeling. His projects have received paper and demo awards at major NLP, ML, visualization, and hardware conferences, an NSF Career Award, and a Sloan Fellowship.

Generative Information Retrieval and E-commerce

Haixun Wang, Instacart, USA

Abstract
Information Retrieval and E-Commerce are seeing a great opportunity in the generative AI age. The challenge of maintaining relevance across a vast array of queries, especially new and obscure ones, has persisted. Despite significant investments by leaders like Amazon, e-commerce search still lags behind web search advancements by Google and Bing. Recent breakthroughs in Large Language Models (LLMs) and generative AI have renewed my optimism. These technologies decode user intent more accurately and handle nuanced queries, transforming e-commerce search into an intuitive, natural language-like experience. The advent of multimodal foundation models promises richer, personalized experiences by integrating text, images, videos, and voice. This shift moves e-commerce from static solutions to dynamic, responsive environments tailored to individual preferences. Additionally, the rise of intelligent agents and the fusion of physical and digital shopping through innovations like drone delivery and AR/VR will profoundly impact the industry. As we stand on the brink of this transformation, the key question is who will lead this revolution: tech giants, specialized vertical vendors, or existing e-commerce companies? In this talk, I will explore the current landscape of e-commerce search and outline a future where it exceeds customer expectations.

Haixun Wang is currently an IEEE fellow, editor in chief of IEEE Data Engineering Bulletin, and a VP of Engineering and Distinguished Scientist at Instacart. Before Instacart, he was a VP of Engineering and Distinguished Scientist at WeWork, a Director of Natural Language Processing at Amazon, and he led the NLP team working on Query and Document Understanding at Facebook. From 2013 to 2015, he was with Google Research working on natural language processing. From 2009 to 2013, he led research in semantic search, graph data processing systems, and distributed query processing at Microsoft Research Asia. He had been a research staff member at IBM T. J. Watson Research Center from 2000 to 2009. He received the Ph.D. degree in Computer Science from the University of California, Los Angeles in 2000. He has published more than 200 research papers in international journals and conference proceedings. He serves as a trustee of the VLDB Endowment and has held roles such as PC Chair for conferences like SIGKDD and CIKM, as well as editorial board member for journals like IEEE Transactions on Knowledge and Data Engineering (TKDE). He won the 10-year ICDE influential paper award in 2024, ICDE best paper award in 2015, ICDM10-year best paper award in 2013, and the best paper award of ER 2009.