Find Nearest Neighbors by Semantic Similarity — hf_nearest_neighbors • huggingfaceR

Find the k most similar texts to a query text based on embedding similarity.

Usage

hf_nearest_neighbors(
  data,
  query,
  k = 5,
  text_col = "text",
  model = "BAAI/bge-small-en-v1.5",
  token = NULL
)

Arguments

data: A data frame with an 'embedding' column (from hf_embed_text).
query: Character string. The query text to compare against.
k: Integer. Number of nearest neighbors to return. Default: 5.
text_col: Character string. Name of text column. Default: "text".
model: Character string. Model to use for query embedding. Should match the model used for data embeddings.
token: Character string or NULL. API token for authentication.

Value

A tibble with the k nearest neighbors, sorted by similarity (descending).

Examples

if (FALSE) { # \dontrun{
docs_embedded |>
  hf_nearest_neighbors("machine learning", k = 5)
} # }