What is huggingfaceR?
huggingfaceR provides R users with direct access to over 500,000 machine learning models hosted on the Hugging Face Hub. The package uses the Hugging Face Inference API, so you can perform natural language processing tasks – classification, embeddings, chat, text generation, and more – without installing Python or managing model weights locally.
Key design principles:
- No Python required. Authentication and a network connection are all you need.
- Tidyverse-native. Every function accepts character vectors and returns tibbles.
- Pipe-friendly. Functions compose naturally with dplyr, tidyr, and the rest of the tidyverse.
Installation
Install the released version from CRAN or the development version from GitHub:
# From CRAN
install.packages("huggingfaceR")
# Development version
# install.packages("devtools")
devtools::install_github("farach/huggingfaceR")Authentication
Hugging Face requires an API token for inference requests. To obtain one:
- Create a free account at huggingface.co.
- Navigate to Settings > Access Tokens.
- Generate a token with at least read access.
Then configure the token in R:
library(huggingfaceR)
# Store your token persistently (writes to .Renviron)
hf_set_token("hf_your_token_here", store = TRUE)
# Verify authentication
hf_whoami()After storing the token, it is loaded automatically in future sessions.
Quick Tour
Classify Text
Assign labels to text using pre-trained classifiers. The default model performs sentiment analysis, but you can supply any classification model from the Hub.
# Sentiment analysis
hf_classify("I love using R for data science!")
# Zero-shot classification with custom labels (no training needed)
hf_classify_zero_shot(
"NASA launches new Mars rover",
labels = c("science", "politics", "sports", "entertainment")
)Generate Embeddings
Convert text into dense numeric vectors that capture semantic meaning. Similar texts produce similar vectors.
sentences <- c(
"The cat sat on the mat",
"A feline rested on the rug",
"The dog played in the park"
)
embeddings <- hf_embed(sentences)
embeddings
# Compute pairwise cosine similarity
hf_similarity(embeddings)Chat with a Language Model
Interact with open-source large language models through a simple interface.
Explore the Hub
Search for models and load datasets directly into R without leaving your session.
# Find popular text classification models
hf_search_models(task = "text-classification", limit = 5)
# Load dataset rows into a tibble
imdb <- hf_load_dataset("imdb", split = "train", limit = 100)
head(imdb)Working with Data Frames
All huggingfaceR functions accept character vectors and return tibbles, so they integrate naturally into tidyverse pipelines.
library(dplyr)
library(tidyr)
reviews <- tibble(
product_id = 1:5,
review = c(
"Excellent quality, highly recommend!",
"Broke after one week of use",
"Good value for the price",
"Disappointing, not as advertised",
"Love it! Will buy again"
)
)
# Add sentiment scores
reviews |>
mutate(sentiment = hf_classify(review)) |>
unnest(sentiment) |>
select(product_id, review, label, score)Next Steps
For deeper coverage of each capability, see the following vignettes:
- Text Classification and Zero-Shot Labeling – sentiment analysis, custom categories, and data frame workflows.
- Embeddings, Similarity, and Semantic Search – vector representations, clustering, topic modeling, and visualization.
- Chat, Conversations, and Text Generation – LLM interaction patterns, multi-turn conversations, and fill-mask.
- Hub Discovery, Datasets, and Tidymodels Integration – searching models, loading data, and building ML pipelines with embeddings.
- Analyzing the Anthropic Economic Index – a research-oriented case study using embeddings, clustering, and zero-shot classification on real-world AI adoption data.