Skip to contents

retrieve_similar_vectors lets you retrieve the most similar content, based on cosine similarity, from a given context.

Usage

retrieve_similar_vectors(
  context_df,
  prompt_vector,
  max_results = 10,
  similarity_threshold = 0.5
)

Arguments

context_df

a context dataframe that contains text embeddings and text information for retrieval

prompt_vector

a prompt transformed into a vector of embeddings in order to kick off the search

max_results

the maximum number of results to be retrieved at once

similarity_threshold

the threshold between 0 and 1 (defaults to 0.5) to remove the least relevant results. 1 means perfect similarity, 0 no similarity at all.

Value

the text vectors that are similar enough to a given prompt.

Details

This function provides a way to make a simple RAG process that will retrieve content based on a context dataframe and a prompt vector.

Examples

conn <- get_ollama_connection()

document <- "Standing proudly on the Île de la Cité in the heart of Paris, 
France's capital city, lies one of the world's most beloved and historic 
landmarks: the magnificent Notre Dame Cathedral. This Gothic masterpiece 
has been welcoming pilgrims and tourists alike for over 850 years, since its 
construction began in 1163 under King Louis VII. With its towering spires, 
stunning stained glass windows, and intricate stone carvings, this beautiful 
church is a testament to medieval architecture and engineering skill. 
Unfortunately, a devastating fire ravaged the cathedral on April 15, 2019, 
but thanks to swift action from firefighters and restoration efforts 
underway, Notre Dame continues to inspire awe in those who visit her."

writeLines(document, con = "doc1.txt")

context_df <- convert_batch_documents_to_embeddings(ollama_connection = conn, 
                                      document_path_list = list("doc1.txt"))

prompt <- "When was Notre Dame in Paris built?"
prompt_vector <- get_ollama_embeddings(ollama_connection = conn, input =  prompt)
similar_text <- retrieve_similar_vectors(context_df, prompt_vector)