Skip to contents

Compute pairwise cosine similarity between all embeddings in a tibble. Useful for finding semantically similar texts.

Usage

foundry_similarity(data, text_col = "text", top_k = NULL, as_matrix = FALSE)

Arguments

data

A tibble from foundry_embed() containing an embedding list-column.

text_col

Character. Name of the column containing text labels. Default: "text".

top_k

Integer. Optional maximum number of most-similar pairs to return.

as_matrix

Logical. If TRUE, return the full cosine-similarity matrix instead of a long pairwise tibble.

Value

A tibble with columns:

text_1

Character. First text.

text_2

Character. Second text.

similarity

Numeric. Cosine similarity between -1 and 1.

Examples

if (FALSE) { # \dontrun{
texts <- c("I love R", "R is my favorite language", "Python is also good")
embeddings <- foundry_embed(texts, model = "text-embedding-ada-002")
foundry_similarity(embeddings)
} # }