Why It Might Be Time to Rethink Using Cosine Similarity
Last year, researchers from Netflix released a study on the nuances of using cosine distance in recommendation systems. Their experiments revealed underlying issues with cosine similarity as a metric that can lead to arbitrary and meaningless results.
Cosine similarity measures the cosine of the angle between two vectors or, equivalently, the dot product of their normalized forms. It has proven useful in various applications, such as recommender systems and NLP tasks. However it has been proven to have poor results in certain scenarios.
Using cosine similarity as a training objective for machine learning models is valid and mathematically sound. It combines two fundamental operations in deep learning: dot product and normalization. The trouble arises when we push it beyond its intended scope, particularly in cases where the objective function used in model training isn't cosine similarity. Which is usually the case.
It is easy to demonstrate, how sentences that are semantically similar will have poor result when calculating their cosine similarity. For instance let’s have as an example the following 3 sentences:
“AI can make you rich.”
“AI can make you itch.“
“Mastering AI can fill your pockets.“
Let’s calculate the cosine similarity between 1 and 2 and then 1 and 3. What do you expect as a result?. Semantically 1 and 3 are closer so let’s see.
Ah! The vector resulting from sentence (1) is closer to sentence (2) than to sentence (3). That’s likely not the desired outcome in an NLP pipeline or a recommendation system. This simple example demonstrates how cosine similarity can lead to unexpected results.
Cosine similarity is a quick fix for vector comparisons, as shown in the previous example. While it works and can be useful, in some cases—and I would argue in most cases—it masks deeper underlying issues.
So, what can we do about it? There are several alternatives, and it all comes down to one thing: testing, researching, and more testing! Below, I will outline possible solutions, but as you might already know, there’s no silver bullet. It’s all about experimenting and testing with your deep learning model and data.
Train models directly with respect to cosine similarity.
Use LLM’s to guide the search, perhaps incorporate an AI agentic approach.
Use a different metric for comparison as the Euclidean distance or soft cosine similarity.
Clean and standardize text before embedding before using cosine similarity can help mitigate some of its issues.
Resources
[1] Is Cosine-Similarity of Embeddings Really About Similarity?