Load a dataset from Hugging Face Hub using the Datasets Server API. This is an API-first approach that doesn't require Python. For local dataset loading with Python, see the legacy function or advanced vignette.
Usage
hf_load_dataset(
dataset,
split = "train",
config = NULL,
limit = 1000,
offset = 0,
token = NULL
)Arguments
- dataset
Character string. Dataset name (e.g., "imdb", "squad").
- split
Character string. Dataset split: "train", "test", "validation", etc. Supports Hugging Face slice syntax such as `"train[100:200]"`. Percentage slices like `"train[:10\ "train".
- config
Character string or NULL. Dataset configuration/subset name. If NULL (default), auto-detected from the dataset's available configs.
- limit
Integer. Maximum number of rows to fetch. Default: 1000. Set to Inf to fetch all rows (may be slow for large datasets).
- offset
Integer. Row offset for pagination. Default: 0.
- token
Character string or NULL. API token for private datasets.
Examples
if (FALSE) { # \dontrun{
# Load first 1000 rows of IMDB train set
imdb <- hf_load_dataset("imdb", split = "train", limit = 1000)
# Load test set
imdb_test <- hf_load_dataset("imdb", split = "test", limit = 500)
# Load a slice of a split
imdb_sample <- hf_load_dataset("imdb", split = "train[100:200]", limit = Inf)
} # }