Skip to contents

Extract named entities (people, organizations, locations, etc.) from text using a token-classification model via the Hugging Face Inference Providers API. Returns one row per detected entity, with character offsets that let you highlight or join back to the source text. Inputs that produce no entities (and `NA` inputs) yield a single row with `NA` entity fields so every input is represented.

Usage

hf_ner(
  text,
  model = hf_default_model("ner"),
  aggregation_strategy = "simple",
  token = NULL,
  endpoint_url = NULL,
  ...
)

Arguments

text

Character vector of text(s) to analyze.

model

Character string. Model ID from the Hugging Face Hub. Append `":provider"` to select an inference provider. Default: "dslim/bert-base-NER".

aggregation_strategy

Character string. How sub-word tokens are grouped into entities: one of "none", "simple", "first", "average", "max". Default: "simple".

token

Character string or NULL. API token for authentication.

endpoint_url

Character string or NULL. A custom Inference Endpoint URL.

...

Additional arguments (currently unused).

Value

A tibble with columns: text, word, entity_group, score, start, end.

Examples

if (FALSE) { # \dontrun{
hf_ner("Barack Obama was born in Hawaii.")

# One row per entity, ready to count or join
library(dplyr)
hf_ner(headlines) |>
  filter(!is.na(word)) |>
  count(entity_group, sort = TRUE)
} # }