In this article, I will demonstrate how to implement hybrid search within a RAG (Retrieval-Augmented Generation) pipeline to enhance the effectiveness of the retrieval phase.
I will be using Milvus as a vector database and the M3-Embedding model for generating dense and sparse vectors embeddings.
Let’s start with a short description of RAG.
Short description of Retrieval Augmented Generation (RAG)
RAG is a technique that emerged with the great evolution in the last years of Natural Language Generation (NLG) models. It integrates retrieval mechanisms with generative language models to improve output accuracy, effectively addressing the primary limitations of Large Language Models (LLMs).
Traditional NLG models, particularly sequence-to-sequence architectures, generate fluent and coherent text. However, these models depend heavily on training data and often face challenges in producing factually accurate or contextually relevant content for queries requiring knowledge beyond their training data, such as private data.
RAG is an emerging hybrid technique designed to address the limitations of pure generative models integrating two key components:
A retrieval mechanism, which retrieves relevant documents or information from an external knowledge source.
A generative module, which processes this information to generate human-like text.
This combination enables RAG architectures to generate fluent text while grounding their outputs in real-world, up-to-date data.
Before I show the implementation of the hybrid search. let’s define the main components of it.
Sparse Vectors
Sparse vectors are used as a data representation in Natural Language Processing (NLP). Their main characteristic is that most of the elements are zero—keeping only what is more relevant in terms of the data representation. This allows sparse vectors to be more accurate when it comes to applications that require precise matching of keywords or sentences.
Common applications:
Text analysis
Recommendation systems
Image processing
Sparse vectors can be generated by for instance using the BM25 method. It relies on the frequency of words in a document and does not attempt to comprehend the meaning of context of the words. It also required the computation of the entire corpus in advance.
Dense Vectors
Dense vectors are a numerical representation of semantic meaning and they are ideal for capturing deep semantic relationships. Compared to sparse vectors, dense vectors encode more information per dimension than sparse vectors, capturing complex patterns and relationships for easier analysis in high-dimensional spaces. For example, in a sparse vector, the vectors for king and queen would be just as dissimilar as the vectors king and apple, even thought king and queen have related meanings.
Common applications:
Sentiment analysis
Information retrieval with semantic meaning
Machine translation
Hybrid Search
Hybrid search leverages the strengths of both approaches, combining the precision of sparse vectors with the deep contextual comprehension of dense vectors. This combination ensures that no important documents are overlooked, whether they align precisely with the query or capture its broader intent.
In the following section I am going to demonstrate how using hybrid search in a RAG pipeline can improve the results of the retrieval phase.
Practical Implementation
To be able to use hybrid search we require a model embeddings with the capability to generate embeddings as dense and sparse vectors. M3-Embedding: A new versatile model for Multi-Linguality, Multi-Functionality, and Multi-Granularity. It provides a uniform support for the semantic retrieval of more than 100 working languages. It can simultaneously accomplish the three common retrieval functionalities: dense retrieval, multi-vector retrieval, and sparse retrieval. Besides, it is also capable of processing inputs of different granularities, spanning from short sentences to long documents of up to 8,192 tokens [4].
For the demonstration I will use the rag-mini-wikipedia dataset which has a corpus of 3200 questions in different topics.
Then I will use Milvus which is an open-source vector database that supports both types of vectors (sparse and dense) in one collection, allowing for hybrid search that enhances the result relevance.
The Jupyter notebook with the implementation and results can be found here and in this article I wanted to highlight the steps performed on the notebook and an analysis of the results. For the whole implementation refer to the Jupyter notebook and the references at the end of the article.
Analysis of the dataset
After downloading the dataset we can explore the content. We can see that it is type Dataset with two features passage and id and it contains 3200 rows.
The query I will explore on the demonstration is the following: Where Elephants live?
Results
After creating the embeddings of the corpus as dense and sparse vectors and inserting them into the Milvus collection I performed 3 types or search: Dense search, Sparse search and Hybrid search and limiting the results to the top 3. Let’s see the results of each of them.
Dense Search Results
distance: 0.6662917137145996
text: Elephants (Elephantidae) are a family in the order Proboscidea in the class Mammalia. They were once classified along with other thick skinned animals in a now invalid order, Pachydermata. There are three living species: the African Bush Elephant, the African Forest Elephant (until recently known collectively as the African Elephant), and the Asian Elephant (also known as the Indian Elephant). Other species have become extinct since the last ice age, which ended about 10,000 years ago, the Mammoth being the most well-known of these.
distance: 0.6637986898422241
text: Elephants are also commonly exhibited in zoos and wild animal parks.
distance: 0.6518104076385498
text: Elephant footprints (tire tracks for scale)Elephants live in a structured social order. The social lives of male and female elephants are very different. The females spend their entire lives in tightly knit family groups made up of mothers, daughters, sisters, and aunts. These groups are led by the eldest female, or matriarch. Adult males, on the other hand, live mostly solitary lives.
Not bad results at all. The documents retrieved using dense search captured the context and semantic of the question and matched with the best documents.
Sparse Search Results
distance: 0.22559085488319397
text: Elephants are mammals, and the largest land animals alive today. The elephant's gestation period is 22 months, the longest of any land animal. At birth it is common for an elephant calf to weigh 120 kilograms (265 lb). An elephant may live as long as 70 years, sometimes longer. The largest elephant ever recorded was shot in Angola in 1956. This male weighed about 12,000 kg (26,400 lb), with a shoulder height of 4.2 m (13.8 ft), a metre (3 ft 4 in) taller than the average male African elephant. The smallest elephants, about the size of a calf or a large pig, were a prehistoric species that lived on the island of Crete during the Pleistocene epoch. Bate, D.M.A. 1907. On Elephant Remains from Crete, with Description of Elephas creticus sp.n. Proc. zool. Soc. London: 238-250.
distance: 0.20308682322502136
text: Elephants (Elephantidae) are a family in the order Proboscidea in the class Mammalia. They were once classified along with other thick skinned animals in a now invalid order, Pachydermata. There are three living species: the African Bush Elephant, the African Forest Elephant (until recently known collectively as the African Elephant), and the Asian Elephant (also known as the Indian Elephant). Other species have become extinct since the last ice age, which ended about 10,000 years ago, the Mammoth being the most well-known of these.
distance: 0.18690812587738037
text: Elephant footprints (tire tracks for scale)Elephants live in a structured social order. The social lives of male and female elephants are very different. The females spend their entire lives in tightly knit family groups made up of mothers, daughters, sisters, and aunts. These groups are led by the eldest female, or matriarch. Adult males, on the other hand, live mostly solitary lives.
We can noticed more poor results using sparse vectors which is expected. If we really want to capture the semantic of the query and document we cannot rely on sparse vectors.
Hybrid Search Results
distance: 1.2508615255355835
text: Elephants (Elephantidae) are a family in the order Proboscidea in the class Mammalia. They were once classified along with other thick skinned animals in a now invalid order, Pachydermata. There are three living species: the African Bush Elephant, the African Forest Elephant (until recently known collectively as the African Elephant), and the Asian Elephant (also known as the Indian Elephant). Other species have become extinct since the last ice age, which ended about 10,000 years ago, the Mammoth being the most well-known of these.
distance: 1.2426867485046387 text: Elephant footprints (tire tracks for scale)Elephants live in a structured social order. The social lives of male and female elephants are very different. The females spend their entire lives in tightly knit family groups made up of mothers, daughters, sisters, and aunts. These groups are led by the eldest female, or matriarch. Adult males, on the other hand, live mostly solitary lives.
distance: 0.6865341663360596
text: Elephants are also commonly exhibited in zoos and wild animal parks.
Very similar results to the dense search but enhanced in the documents retrieved. We can see on this query and any other that you can imagine how hybrid search can improve the results quality during the retrieval phase of a RAG pipeline.
Conclusions
We can constatate that using dense and hybrid search we obtained the best results. For different queries we can obtain different results while having a hybrid search we can enhanced the results during the retrieval phase.
The are different parameters and mechanism we could tweak on this pipeline like the rerank weights for each type of vectors depending in our needs.
Another advanced techniques to explore to enhanced the performance of the RAG pipeline that I do not explore in this article are:
Creating sub-queries: When a user query is too complicated, we can use an LLM to break it down into simpler sub-queries before passing them on to the vector database and the LLM. Let’s take a look at an example.
Filtered search: An ANN search identifies vector embeddings similar to a given query but may not always yield accurate results. Adding filtering conditions narrows the search scope to entities matching specific criteria, improving precision.
If you want to see how I am using filtered search to build an AI tool that helps SREs and DevOps teams to gain visibility into their systems and applications check here.
In future posts I would like to explore more advanced RAG techniques and performing observability on RAG using for instance LangSmith and other available tools.
References
[3] Dataset: rag-mini-wikipedia
[5] Jupyter Notebook: Hybrid Search RAG with Milvus
[6] AI tool that helps SREs and DevOps teams to gain visibility into their systems and applications