Skip to contents

Create text embeddings using a Hugging Face model as part of a tidymodels recipe. This step converts text columns into embedding features for downstream modeling.

Usage

step_hf_embed(
  recipe,
  ...,
  role = "predictor",
  trained = FALSE,
  model = "BAAI/bge-small-en-v1.5",
  token = NULL,
  embeddings = NULL,
  skip = FALSE,
  id = recipes::rand_id("hf_embed")
)

# S3 method for class 'step_hf_embed'
tidy(x, ...)

# S3 method for class 'step_hf_embed'
tunable(x, ...)

Arguments

recipe

A recipe object.

...

One or more text column selectors (see recipes::selections()).

role

Character string. Role for the new embedding variables. Default: "predictor".

trained

Logical. Internal use only.

model

Character string. Hugging Face model ID for embeddings. Default: "BAAI/bge-small-en-v1.5".

token

Character string or NULL. API token for authentication.

embeddings

List. Internal use only (stores embeddings during training).

skip

Logical. Should step be skipped when baking? Default: FALSE.

id

Character string. Unique ID for this step.

x

A step_hf_embed object

Value

An updated recipe object.

Examples

if (FALSE) { # \dontrun{
library(tidymodels)
library(dplyr)

# Create a recipe with embeddings
rec <- recipe(sentiment ~ text, data = train_data) |>
  step_hf_embed(text, model = "BAAI/bge-small-en-v1.5")

# Use in a workflow
wf <- workflow() |>
  add_recipe(rec) |>
  add_model(logistic_reg()) |>
  fit(data = train_data)
} # }