Skip to contents

Find the k most similar texts to a query text based on embedding similarity.

Usage

hf_nearest_neighbors(
  data,
  query,
  k = 5,
  text_col = "text",
  model = "BAAI/bge-small-en-v1.5",
  token = NULL
)

Arguments

data

A data frame with an 'embedding' column (from hf_embed_text).

query

Character string. The query text to compare against.

k

Integer. Number of nearest neighbors to return. Default: 5.

text_col

Character string. Name of text column. Default: "text".

model

Character string. Model to use for query embedding. Should match the model used for data embeddings.

token

Character string or NULL. API token for authentication.

Value

A tibble with the k nearest neighbors, sorted by similarity (descending).

Examples

if (FALSE) { # \dontrun{
docs_embedded |>
  hf_nearest_neighbors("machine learning", k = 5)
} # }