What is huggingfaceR?
huggingfaceR provides R users with access to machine learning models hosted on the Hugging Face Hub via the Hugging Face Inference API. You can perform natural language processing tasks – classification, embeddings, chat, text generation, and more – without installing Python or managing model weights locally. Note that the Inference API serves a curated subset of the 500,000+ models on the Hub; not every model is available for serverless inference.
Key design principles:
- No Python required. Authentication and a network connection are all you need.
- Tidyverse-native. Every function accepts character vectors and returns tibbles.
- Pipe-friendly. Functions compose naturally with dplyr, tidyr, and the rest of the tidyverse.
Capability matrix
| Workflow | Start with | Returns |
|---|---|---|
| Sentiment and labels |
hf_classify(),
hf_classify_zero_shot()
|
one tibble row per text or label |
| Embeddings and search |
hf_embed(), hf_similarity(),
hf_nearest_neighbors()
|
vectors, similarities, nearest rows |
| Chat and agents |
hf_chat(), hf_conversation(),
hf_tool()
|
assistant messages and tool calls |
| Structured extraction | hf_extract() |
one tidy row per input with schema columns |
| Text tasks |
hf_summarize(), hf_translate(),
hf_ner(), hf_question_answer()
|
task-specific tidy columns |
| Multimodal |
hf_transcribe(), hf_text_to_image(),
hf_classify_image()
|
transcripts, files/raw bytes, image labels |
| Hub workflows |
hf_hub_download(), hf_list_providers(),
hf_push_dataset()
|
files, provider metadata, guarded uploads |
Installation
Install the released version from CRAN or the development version from GitHub:
# From CRAN
install.packages("huggingfaceR")
# Development version
# install.packages("devtools")
devtools::install_github("farach/huggingfaceR")Authentication
Hugging Face requires an API token for inference requests. To obtain one:
- Create a free account at huggingface.co.
- Follow the Hugging Face access tokens documentation.
- Generate a token with at least read access.
Then configure the token in R:
library(huggingfaceR)
# Store your token persistently (writes to .Renviron)
hf_set_token("hf_your_token_here", store = TRUE)
# Verify authentication
hf_whoami()After storing the token, it is loaded automatically in future sessions.
Quick Tour
Classify Text
Assign labels to text using pre-trained classifiers. The default
model performs sentiment analysis, but you can supply any classification
model available on the Hugging Face Inference API. Not all models on the
Hub support serverless inference — use
hf_check_inference(model_id) to verify.
# Sentiment analysis
hf_classify("I love using R for data science!")
#> # A tibble: 1 × 3
#> text label score
#> <chr> <chr> <dbl>
#> 1 I love using R for data science! POSITIVE 1.000
# Zero-shot classification with custom labels (no training needed)
hf_classify_zero_shot(
"NASA launches new Mars rover",
labels = c("science", "politics", "sports", "entertainment")
)
#> # A tibble: 4 × 3
#> text label score
#> <chr> <chr> <dbl>
#> 1 NASA launches new Mars rover science 0.957
#> 2 NASA launches new Mars rover entertainment 0.0311
#> 3 NASA launches new Mars rover sports 0.00785
#> 4 NASA launches new Mars rover politics 0.00395Generate Embeddings
Convert text into dense numeric vectors that capture semantic meaning. Similar texts produce similar vectors.
sentences <- c(
"The cat sat on the mat",
"A feline rested on the rug",
"The dog played in the park"
)
embeddings <- hf_embed(sentences)
embeddings
#> # A tibble: 3 × 3
#> text embedding n_dims
#> <chr> <list> <int>
#> 1 The cat sat on the mat <dbl [384]> 384
#> 2 A feline rested on the rug <dbl [384]> 384
#> 3 The dog played in the park <dbl [384]> 384
# Compute pairwise cosine similarity
hf_similarity(embeddings)
#> # A tibble: 3 × 3
#> text_1 text_2 similarity
#> <chr> <chr> <dbl>
#> 1 The cat sat on the mat A feline rested on the rug 0.748
#> 2 The cat sat on the mat The dog played in the park 0.516
#> 3 A feline rested on the rug The dog played in the park 0.555Chat with a Language Model
Interact with open-source large language models through a simple interface.
# Single question
hf_chat("What is the tidyverse?", max_tokens = 60)
#> # A tibble: 1 × 5
#> role content model tokens_used tool_calls
#> <chr> <chr> <chr> <int> <list>
#> 1 assistant The tidyverse is a collection of R pac… meta… 60 <list [0]>
# Guide the model with a system prompt
hf_chat(
"Explain logistic regression in two sentences.",
system = "You are a statistics instructor. Use plain language.",
max_tokens = 80
)
#> # A tibble: 1 × 5
#> role content model tokens_used tool_calls
#> <chr> <chr> <chr> <int> <list>
#> 1 assistant Logistic regression is a statistical m… meta… 69 <list [0]>Explore the Hub
Search for models and load datasets directly into R without leaving your session.
# Find popular text classification models
hf_search_models(task = "text-classification", limit = 5)
#> # A tibble: 5 × 7
#> model_id author task downloads likes tags library
#> <chr> <chr> <chr> <int> <int> <lis> <chr>
#> 1 BAAI/bge-reranker-v2-m3 <NA> text… 16443234 1053 <chr> senten…
#> 2 ProsusAI/finbert <NA> text… 7648889 1184 <chr> transf…
#> 3 BAAI/bge-reranker-base <NA> text… 4167279 238 <chr> senten…
#> 4 cardiffnlp/twitter-roberta-base-se… <NA> text… 3953164 813 <chr> transf…
#> 5 distilbert/distilbert-base-uncased… <NA> text… 3644729 910 <chr> transf…
# Load dataset rows into a tibble
imdb <- hf_load_dataset("imdb", split = "train", limit = 100)
head(imdb)
#> # A tibble: 6 × 4
#> text label .dataset .split
#> <chr> <int> <chr> <chr>
#> 1 "I rented I AM CURIOUS-YELLOW from my video store becau… 0 stanfor… train
#> 2 "\"I Am Curious: Yellow\" is a risible and pretentious … 0 stanfor… train
#> 3 "If only to avoid making this type of film in the futur… 0 stanfor… train
#> 4 "This film was probably inspired by Godard's Masculin, … 0 stanfor… train
#> 5 "Oh, brother...after hearing about this ridiculous film… 0 stanfor… train
#> 6 "I would put this at the top of my list of films in the… 0 stanfor… trainExtract Structured Data
Turn messy prose into analysis-ready columns.
hf_extract(
"Amelie is a chef in Paris who mentions burnout.",
c(name = "string", occupation = "string", city = "string", theme = "string")
)
#> # A tibble: 1 × 4
#> name occupation city theme
#> <chr> <chr> <chr> <chr>
#> 1 Amelie chef Paris burnoutWork with Images and Audio
audio <- "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"
image <- "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png"
transcript <- hf_transcribe(audio, return_timestamps = "word")
substr(transcript$text, 1, 120)
#> [1] " I have a dream that one day this nation will rise up and live out the true meaning of its creed."
hf_classify_image(image, top_k = 3)
#> # A tibble: 3 × 3
#> image label score
#> <chr> <chr> <dbl>
#> 1 https://huggingface.co/datasets/huggingface/documentation-images/… tabb… 0.277
#> 2 https://huggingface.co/datasets/huggingface/documentation-images/… tige… 0.276
#> 3 https://huggingface.co/datasets/huggingface/documentation-images/… Egyp… 0.140
hf_detect_objects(image, threshold = 0.5) |>
filter(label == "cat")
#> # A tibble: 2 × 7
#> image label score xmin ymin xmax ymax
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 https://huggingface.co/datasets/huggingfa… cat 0.997 156 31 385 146
#> 2 https://huggingface.co/datasets/huggingfa… cat 0.999 145 132 429 341Working with Data Frames
All huggingfaceR functions accept character vectors and return tibbles, so they integrate naturally into tidyverse pipelines.
reviews <- tibble(
product_id = 1:5,
review = c(
"Excellent quality, highly recommend!",
"Broke after one week of use",
"Good value for the price",
"Disappointing, not as advertised",
"Love it! Will buy again"
)
)
# Add sentiment scores
reviews |>
mutate(sentiment = hf_classify(review)) |>
unnest(sentiment) |>
select(product_id, review, label, score)
#> # A tibble: 5 × 4
#> product_id review label score
#> <int> <chr> <chr> <dbl>
#> 1 1 Excellent quality, highly recommend! POSITIVE 1.000
#> 2 2 Broke after one week of use NEGATIVE 0.999
#> 3 3 Good value for the price POSITIVE 1.000
#> 4 4 Disappointing, not as advertised NEGATIVE 1.000
#> 5 5 Love it! Will buy again POSITIVE 1.000Next Steps
For deeper coverage of each capability, see the following vignettes:
- Text Classification and Zero-Shot Labeling – sentiment analysis, custom categories, and data frame workflows.
- Embeddings, Similarity, and Semantic Search – vector representations, clustering, topic modeling, and visualization.
- Chat, Conversations, and Text Generation – LLM interaction patterns, multi-turn conversations, and fill-mask.
- Structured Extraction and Tool Calling – JSON-schema extraction, streaming, and R-backed tool calls.
- Multimodal: Images, Audio, and Speech – transcription, image generation, captioning, and object detection.
- Hub Discovery, Datasets, and Tidymodels Integration – searching models, loading data, and building ML pipelines with embeddings.
- Working with the Hub: Download, Upload, and Share – file downloads, provider metadata, and guarded repository writes.
- Analyzing the Anthropic Economic Index – a research-oriented case study using embeddings, clustering, and zero-shot classification on real-world AI adoption data.
For production workloads: The classification and
embeddings vignettes cover batch processing functions
(hf_embed_batch(), hf_classify_batch(), etc.)
that use parallel requests and disk checkpointing for processing
thousands of texts efficiently.
Using Dedicated Inference Endpoints
By default, huggingfaceR sends requests to the free, serverless
Hugging Face Inference API. If you need to use a model that isn’t
available on the serverless API, or you need dedicated capacity for
production workloads, you can deploy a Dedicated
Inference Endpoint and point huggingfaceR at it with the
endpoint_url parameter.
# Check whether a model supports the free serverless API
hf_check_inference("my-org/my-custom-model")
# If not, deploy a Dedicated Endpoint on huggingface.co/inference-endpoints,
# then pass its URL to any huggingfaceR function:
hf_embed(
"Embed this with my dedicated endpoint",
model = "my-org/my-custom-model",
endpoint_url = "https://my-endpoint-id.us-east-1.aws.endpoints.huggingface.cloud"
)
hf_classify(
"Classify with a private model",
model = "my-org/my-classifier",
endpoint_url = "https://my-endpoint-id.us-east-1.aws.endpoints.huggingface.cloud"
)
# Chat and generate also support endpoint_url
hf_chat(
"Hello from my dedicated endpoint!",
model = "my-org/my-llm",
endpoint_url = "https://my-endpoint-id.us-east-1.aws.endpoints.huggingface.cloud"
)The endpoint_url parameter is available on all inference
functions, including batch variants (hf_embed_batch(),
hf_classify_batch(), etc.).