Haystack US 2024

Talks from the Search & Relevance Community at the Haystack Conference!

The conference sessions will be held at the Violet Crown movie theater in central Charlottesville for in-person attendees and streamed live via Zoom for online attendees - get your tickets for both options here.

Please read our Event Safety Guide and Code of Conduct.

Day 1, Tuesday, April 23rd, 2024

Time	Track 1	Track 2
8:00-9:00am EDT	Registration Location: Entrance of the Violet Crown
9:00-9:30am EDT	Welcome to Haystack! Charlie welcomes you to Haystack 2024! Hear about what we have planned for this year's conference, with a special focus this year on the people of the Search & AI community. Charlie Hull Location: Theater 5
9:45-10:30am EDT	All Vector Search is Hybrid Search This talk starts with a strongly-stated central premise: all vector search applications benefit from some form of hybrid search. This can range from a traditional keyword-based search to metadata filtering and more. Inevitably, vector search applications become search applications, and so need to leverage and reconcile other kinds of search scoring functions as well as more sophisticated search modalities like two-stage retrieval and reranking. Further, hybrid search systems must now contend with the very difficult problems of real-time updates to both vectors and vector metadata, and the difficulties of updating these vector indices to keep data fresh. Join this talk and learn about the questions you should be asking yourself as you embark on your hybrid search infra journey. John Solitario Location: Theater 5	Chat With Your Data - A Practical Guide to Production RAG Applications At Moody’s, we went from a laptop prototype to a premium-subscription RAG product in under 6 months. Our users’ workflows have been accelerated by the Research Assistant’s ability to synthesize responses to freeform questions without hallucination. In this talk we will share lessons that we learned the hard way, so you can avoid similar pitfalls when setting up your chunk embedding infrastructure, using AI to measure response relevance, blending relevance with recency, managing costs, and organizing your teams. By the end, you will have the tools to build a strategy for RAG applications with real business impact. Jeff Capobianco Location: Theater 7
10:45-11:30am EDT	Generative AI Search and Summarization Testing using Human SME Techniques An internally developed Human Relevance Testing Framework was successfully modified to support Subject Matter Expert testing of generative AI search results and summaries. These extensions to the existing Search Testing Framework have resulted in fast and frequent evaluation and testing of large, diverse corpora for search and summarization functions. This paper will discuss the modification of the framework from traditional search Human Relevance Testing methods to generative AI, the testing process, the metrics created in support of generative AI, and the outputs of the methodology. Generative search and summarization methods will be discussed. In addition, various other potential testing use cases based on this methodology will be covered, including comparison and regression methods and alternate product extensions, issues, and metrics. Douglas Rosenoff Location: Theater 5	Learning to Rank at Reddit : A Project Retro In today’s AI based world, Reddit stands out as a deep catalog of human, subjective information. Whether product reviews or the deeply personal - Reddit searchers want to connect with other humans, not generic AI based answers. We at Reddit would like the site-search experience to be better, so you don’t need to add “Reddit” to your Google search. That’s what we’re trying to do with Learning to Rank: turning relevance into a repeatable, data-driven solution. The journey hasn’t been an easy one. We want to share our painful lessons learned working with training data, developing features, the Solr Learning to Rank plugin, scaling Learning to Rank to 1000s of QPS, and more. Hopefully, you can learn from the egg we constantly found on our faces! See how our scrappy team has been slowly turning LTR from a science project into a repeatable process of constant, data-informed improvement. From a lab to an assembly line, come and learn from our painful lessons big and small. Doug Turnbull & Chris Fournier & Cliff Chen Location: Theater 7
11:45am-12:30pm EDT	Apache Lucene: From Text Indexing to Artificial Intelligence Apache Lucene celebrated its twenty-second anniversary last September, a journey that continues to profoundly impact the world of Search and Data technologies. Lucene is the engine behind giants such as Elasticsearch, OpenSearch, Apache Solr, and the recent Atlas Search from MongoDB. Its integration into numerous other Open Source projects, such as Apache Nutch - the pioneering web crawler and precursor to Hadoop, and Apache Cassandra - the most scalable NoSQL database, attests to its widespread influence. Used in thousands of enterprise projects, including by leaders like LinkedIn and Twitter, Lucene enjoys a solid and diverse user base. The conference will dive into Lucene's evolution, from its essential inverted indexing for text processing to recent innovations that reflect continuous technological advancement. To conclude, we will discuss Lucene's latest features: vector indexing and vector search, which create a powerful synergy with generative artificial intelligence, opening new horizons for the future of search. Lucian Precup Location: Theater 5	Retro Relevance: Lessons Learned Balancing Keyword and Semantic Search Semantic search has revolutionized how we approach information retrieval, offering a nuanced understanding of human language that surpasses traditional keyword-based search methodologies for natural language queries. However, this technology is not without its limitations. This presentation critically examines scenarios where semantic search falls short of expectations, failing to consistently deliver the most relevant results. Semantic search’s challenges and limitations include niche queries and domain-specific jargon. While semantic search excels in understanding context and semantics, it sometimes overlooks the importance of exact matches and user expectations, leading to a gap between delivered results and user satisfaction. Kathleen DeRusso Location: Theater 7
12:30pm-2:00pm EDT	Lunch Find lunch at one of the many options available on Charlottesville’s Downtown Mall Location: Your choice!
2:00-2:45pm EDT	Vector (Hybrid) Search Live A/B Test Results in E-commerce (DIY) Sector We have statistically significant A/B test results for our Vector (Hybrid) Search for 2 of our large DIY retail clients E-commerce website. This was a 50-50 user split A/B test. The 2 variants were: 1. Prefixbox's existing, manually-optimized Search Engine 2. Prefixbox's Vector (Hybrid) Search Results from 1 of the tests are the following: 21% revenue increase for search users 12% increase in transactions for search users 17% increase revenue for overall site visitors 11% increase in transactions overall site visitors 5.7% increase in SERP clicks We are using ElasticSearch both for keyword and vector search. We use OpenAI's embeddings. In this talk, we are going to explain: - how we create the embeddings for the products and the search query - how we blend the keyword and vector search results - and the results of the live A/B tests - show live search examples Istvan Simon Location: Theater 5	Revisiting the Basics: How Round Robin Improved Search Relevancy In the rapidly evolving landscape of search technologies, the allure of novel, complex algorithms often overshadows the potential of simple, traditional solutions. This talk aims to shift the focus back to the basics, demonstrating how a standard Round Robin algorithm improved search relevance in an enterprise-level application. Our journey begins with a dual challenge: aligning the comprehensive view of our data with individual verticals and improving its underperforming search relevance metrics. We experimented with three methods to merge our vertical datasets into the comprehensive view, finding a standard Round Robin algorithm to be the most effective. We also tailored this algorithm to prioritize certain datasets, aligning with business requirements and customer expectations. The results were striking. All our internal relevance metrics, for both implicit and explicit human judgements, improved. Notably, our explicit judgement ERR score jumped from 0.49 to 0.61, underscoring the effectiveness of our approach. Join us as we explore how revisiting the basics led to gains, and why the Round Robin algorithm, a simple yet powerful tool, can still hold surprising value in today's advanced search technology landscape. Joelle Robinson Location: Theater 7
3:00-3:45pm EDT	CLAP With Me: Step by Step Semantic Search on Audio Sources Everyone has seen a demo, or maybe even implemented semantic search using vector embeddings. But these implementations are predicated on the idea of a text query finding similarities in text data. What about other forms of data? Maybe you’ve heard of CLIP, a Machine Learning approach to connecting images to text used by companies such as OpenAI. Introducing… CLAP (Contrastive Language–Audio Pre-training), an approach to bring audio and text data to a single multimodal space, unlocking semantic search across audio data. In this talk, we’ll discuss the basics of CLAP – what it is and what it does. Then, we’ll build a small application that generates CLAP vector embeddings from audio files, indexes them to Opensearch, and implements a semantic search query over the audio data. Let’s get CLAP-ing! AJ Wallace Location: Theater 5	Exploring dense vector search at scale Vector search is no longer confined to theoretical applications or specialized use cases. Its arrival in the mainstream search world unlocks a range of new experiences. Finally, there is a way to truly search for meaning rather than words; it could turn traditional search engines into answer engines, opening up a plethora of new avenues to improve the overall user experience of how people look for meaningful information. Early prototypes of RAG systems demonstrate the potential vector search has to cause an information retrieval evolution. Vector search is the new shiny object that will be part of our future one way or another. Tom Burgmans & Mohit Sidana Location: Theater 7
4:00-5:15pm EDT	Lightning Talks Quick discussions about anything around search relevance! We'll collect talks during the day. Location: Theater 5
5:30-8:00pm EDT	Haystack Reception & Dinner (included with registration) All attendees are welcome. The location is Kardinal Hall located here. It is about a 10 minute walk from the conference venue. Location: Kardinal Hall

Day 2, Wednesday, April 24th, 2024

Time	Track 1	Track 2
8:00-9:00am EDT	Coffee Location: Entrance of the Violet Crown
9:00-9:15am EDT	Welcome Back Location: Theater 5
9:15-10:00am EDT	Day 2 Keynote - Evolution of Moody's Search: Embracing Cloud-Native Solutions to Enable AI-Powered Product This presentation will delve into Moody's transformative journey in search architecture, showcasing a significant shift from traditional servers running legacy SOLR software to a cloud-native AWS OpenSearch framework. The transition not only addresses the need for high availability and scalability but also marks a departure from the conventional monolithic relevance system, Lucidworks, towards custom designed in-house AWS serverless relevance architecture. Collaborating closely with the OpenSource Connections (OSC) team, Moody's is progressing towards establishing search as a platform, aiming to enhance cross-products usability seamlessly, including for AI use cases. Jeremy Hudson & Rene Kriegler Location: Theater 5
10:15-11:00am EDT	Search and Retrieval- AI’s Most Successful Hack Search and information retrieval systems have long embraced AI and Machine Learning to improve efficiency and relevance, but the converse hasn’t been true until recently. And this is no surprise. Before we started using Generative AI models for everything, we were mostly building specialized models, trained on specific datasets, to solve specific problems. If we are to now use the same model for different tasks, we need to present the model with the most relevant data for the task at hand. Some might call this Retrieval Augmented Generation (RAG), but really it’s a good old recommender system (RecSys) for large language models (LLMs) instead of humans. The core concepts involved in building recommender systems are the following: - Retrieval: Retrieving candidates from a catalog that are most relevant to the user query - Filtering: Filtering irrelevant items - Ranking: Ranking retrieved candidates in order of relevance to the user query In this talk, we will dive deep into Hybrid Search, a commonly used retrieval technique in RAG systems, and how combining it with metadata filtering and re-ranking algorithms results in a scalable recommender system for LLMs a.k.a. RAG. Apoorva Joshi Location: Theater 5	Personalizing search using multimodal latent behavioral embeddings Learning user context from behavioral signals is critical for optimizing search relevance, but most search engine and vector database implementations completely ignore personalization today, relying only on keywords and content embeddings. To fully understand user intent, however, your search engine needs to consider not just the content (text, images, etc.) and domain (entities, relationships, terminology), but also the user context (personal preferences and goals, popularity, and cohort affinities). While high quality embeddings from LLMs and multimodal foundation models have enabled innovative approaches to semantic search, content-based embeddings are usually deployed exclusively, since you can easily use an off-the-shelf model or fine-tune a model on your content using standard libraries. This enables a semantic interpretation of your documents, but it entirely ignores your valuable user interaction data (searches, clicks, and other signals). In this talk, we’ll focus on integrating user behavior into modern search retrieval pipelines for RAG and traditional end-user search. We’ll cover training an embedding model using behavioral signals to discover latent features, adding user behavior as another modality in your multimodal search engine. We’ll cover traditional signals-based models for AI-powered search (signals boosting, collaborative filtering, click-models) and how these map into a multimodal embedding approach that combines the best of your content, domain, and user understanding into a holistic approach to modern search relevance. We’ll also cover general strategies for applying personalization to your search engine, ensuring appropriate contextual guardrails are in place so that the personalization is applied with a helpful, but light touch. We’ll walk though live, open source code examples showing how modern hybrid search approaches can learn these user and group affinities and implement personalized search experiences to delight your users. Trey Grainger Location: Theater 7
11:15-12:00pm EDT	Search Query Understanding with LLMs: From Ideation to Production Understanding user intent from search queries is a critical challenge for providing relevant search results. Yelp has leveraged Large Language Models (LLMs) to enhance user query understanding, marking a significant shift away from more traditional techniques. Throughout this talk, we will present our journey from initial ideation to the full-scale production deployment of LLMs for various query understanding tasks - spelling correction, segmentation, canonicalization, expansion, and highlighting. Key factors that make the query understanding a potentially strong use case for leveraging LLMs include a query-focused task and low volumes of text for processing. The transition to LLMs from previously fragmented systems has added significant intelligence and greatly improved user experience within our search functionality. Ali Rokni Location: Theater 5	Zucchini or Cucumber? Benchmarking Embeddings for Similar Image Retrieval thanks to your weekly Grocery shopping The natural world is harsh and full of surprises, such as edible-looking food that actually kills you. For thousands of years, humans had to face hard questions to survive, such as: 'is this mushroom a great source of nutrients or the last thing I will ever eat?' Modern-day foragers still face similar challenges when adding something to their cart: squinting to differentiate zucchinis from cucumbers, Green Apples from Green Limes, and other similar-looking food items. Can artificial intelligence help us solve this age-old problem? Image-based recommendations are a good technical solution for users facing too many food choices. But can they do better than us humans on such ambiguous items as zucchinis and cucumbers? In this talk, we present the process of building a benchmark, the results of our evaluation, and the lessons we learned from those to improve our Image Recommendation API. You'll learn what makes a dataset suitable for evaluating a problem, how to pick the right metrics to evaluate, and other useful tips to evaluate your own recommender systems on other specific domains based on any available data! Paul-Louis Nech Location: Theater 7
12:00pm-1:30pm EDT	Lunch Find lunch at one of the many options available on Charlottesville’s Downtown Mall Location: Your choice!
1:30-2:30pm EDT	Women of Search present Comparing AI-Augmented Information Retrieval Strategies This multi-speaker talk will trace the life of a query through various AI-powered information retrieval strategies that are gaining traction in the search field today. Technical experts from the Women of Search community will discuss AI-driven IR techniques such as fine-tuning ML models, constructing sophisticated RAG (Retrieval-Augmented Generation) pipelines, and reranking methods. They will provide an in-depth analysis of the advantages and limitations of each strategy, with a focus on identifying the optimal use cases for each. We invite you to join us for what promises to be an educational and inspiring presentation. Audrey Lorberfeld Location: Theater 5
2:45-3:30pm EDT	Why RAG Projects Fail, and How to Make Yours Succeed Chatbots and Question Answering systems built with RAG (Retrieval Augmented Generation) are the de facto standard for GenAI pilots across every industry. However, many of these projects are failing, due to causes related to both project organization and project execution. With the proper knowledge and vigilance, many of these causes can be avoided, or else recognized and mitigated. We will approach this problem from a technology product or services provider, review case studies from real projects, and explore drivers of failure as well as keys to success. On the project organization side, we will explore how clients set projects up to fail through choice of low-value use cases, user incentive misalignment, poor solution framing and expectation setting, and inadequate data readiness. On the project execution side, we will explore how providers fail to meet expectations through non-prescriptive user experience design, failing to define system capabilities, allowing poor data to impact solution quality perception, and just plain bad technology decisions. Though many of these risks are endemic to providing RAG systems in today's technology climate, projects do overcome them and exceed client expectations! You will leave this session with the intuition to spot failure risks early, and proven strategies to manage them. Colin Harman Location: Theater 5	Your Search Engine Needs a Memory! Every search professional needs data about users' behavior. Data is fundamental for analyzing user behavior and improving search relevance, both with manual tuning and with machine learning. User Behavior Insights system provides a standard way to do just that. Business product managers, UX designers, and relevance engineers need data to understand their users: What do they search for? What do they click on? What do they act on or buy? How do they use facets and filters? How do they refine their queries within a session? Engineers need data to improve search relevance and effectiveness, both manually and using AI/ML. Our open-source User Behavior Insights (UBI) system provides a client-side library for instrumenting web pages, a server-side library for collecting data, and analytical tools for understanding it. Critically, it defines a standard schema for behavior data so that the community can contribute additional analytical tools. We have also demonstrated its integration with personalization software. Eric Pugh & Stavros Macrakis Location: Theater 7
3:45-4:30pm EDT	Expanding RAG to incorporate multimodal capabilities. RAG workflows predominantly rely on retrieved text sources, as most Language Models (LLMs) are proficient in understanding only language. However, a substantial portion of unstructured data contains multimodal elements such as text, tables, and images. Focusing solely on text compromises the retrieval process in RAG. In this session, we'll explore the utilization of LLMs and multimodal embeddings to expand RAG for multimodal retrieval and generation. A live demonstration will illustrate how a PDF document is processed in a vector database, extracting content from images, tables, and text; the retriever employs multimodal search and the response is enriched using a LLM. This approach ensures the inclusion of multimodal content in both the ingestion and final response generation phases. Praveen Mohan Prasad & Hajer Bouafif Location: Theater 5	Measuring and Improving the R in RAG The quality of text generated by a RAG system is limited by the quality of the text chunks it uses for context. The textbook way to measure the quality of these chunks is for a human expert to give each pair a judgement. Of course, this is expensive, time-consuming, and impossible in practice for any sizable corpus. An attractive alternative is to use an LLM to learn from the experts and generate these judgements as-needed. From there we can tune our retrieval and ultimately improve the quality of our generated responses. Scott Stults Location: Theater 7
4:30-4:45pm EDT	Closing Location: Theater 5