Recently, Transformer-based deep learning models like GPT-3 have been getting a lot of attention in the machine learning world. These models excel at understanding semantic relationships, and they have contributed to large improvements in Microsoft Bing’s search experience and surpassing human performance on the SuperGLUE academic benchmark. However, these models can fail to capture more nuanced relationships between query and document terms beyond pure semantics. In this blog post, we are introducing “Make Every feature Binary” (MEB), a large-scale sparse model that complements our production Transformer models to improve search relevance for Microsoft customers using AI at Scale. To make search more accurate and dynamic, MEB better harnesses the power of large data and allows for an input feature space with over 200 billion binary features that reflect the subtle relationships between search queries and documents.
Size Notes: "MEB uses three years of search logs from Bing as training data." TODO convert
Notes: See paper title