Skip to contents

Convert unstructured text into tidy columns using a chat model with structured JSON output. The schema argument can be a lightweight named character vector such as c(name = "string", score = "number") or a full JSON Schema list. The function returns one row per input text and one column per schema field.

Usage

hf_extract(
  text,
  schema,
  model = hf_default_model("chat"),
  strict = TRUE,
  system = paste("Extract the requested fields from the user's text.",
    "Return only JSON that matches the schema."),
  token = NULL,
  endpoint_url = NULL,
  ...
)

Arguments

text

Character vector of text(s) to extract from.

schema

A named character vector of field names and JSON types, or a JSON Schema list with object properties.

model

Character string. Model ID from Hugging Face Hub. Default: "meta-llama/Llama-3.1-8B-Instruct".

strict

Logical. Whether to request strict JSON Schema adherence. Default: TRUE.

system

Character string. System prompt sent with each extraction request. Default: a concise extraction instruction.

token

Character string or NULL. API token for authentication.

endpoint_url

Character string or NULL. A custom Inference Endpoint URL. The endpoint must support the chat completions format.

...

Additional parameters passed to the chat-completions request.

Value

A tibble with one row per input and one column per schema field.

Examples

if (FALSE) { # \dontrun{
hf_extract(
  "Amelie is a chef in Paris.",
  c(name = "string", occupation = "string", city = "string")
)

hf_extract(
  c("Great service.", "The delivery was late."),
  c(sentiment = "string", is_complaint = "boolean")
)
} # }