Analyzing the Anthropic Economic Index
Source:vignettes/anthropic-economic-index.Rmd
anthropic-economic-index.RmdIntroduction
The Anthropic Economic Index tracks how AI is being integrated into real-world economic tasks. Built from millions of Claude conversations mapped to the U.S. Department of Labor’s O*NET occupational taxonomy, the dataset provides granular measures of which tasks AI augments versus automates, how usage varies across geographies, and what collaboration patterns emerge between humans and AI.
This vignette demonstrates how to use huggingfaceR to analyze the Anthropic Economic Index from the perspective of AI productivity research. You will learn to:
- Load the dataset directly from the Hugging Face Hub
- Apply semantic embeddings to occupational task descriptions
- Discover latent structure in AI-affected tasks through clustering
- Measure semantic similarity between tasks and research concepts
- Classify tasks along research-relevant dimensions using zero-shot models
- Visualize the embedding space of AI-impacted occupations
These analyses illustrate a fundamental difference between huggingfaceR and conversational LLM interfaces such as ellmer. Where ellmer provides an R6-based chat interface for interacting with language models one prompt at a time, huggingfaceR operates as a programmatic analytical toolkit: it embeds, classifies, clusters, and searches across entire corpora in reproducible pipelines. The workflows below could not be replicated through conversational prompting alone.
Loading the Dataset
The Anthropic Economic Index is hosted as a file-based repository on
the Hugging Face Hub. Because its data files are stored as CSVs rather
than in the standard Datasets format, we load them directly via URL. The
readr::read_csv() function handles this transparently.
base_url <- paste0(
"https://huggingface.co/datasets/Anthropic/EconomicIndex/",
"resolve/main/release_2025_02_10/"
)
# O*NET task statements with occupational codes
task_statements <- read_csv(paste0(base_url, "onet_task_statements.csv"))
# Task-level AI usage percentages
task_usage <- read_csv(paste0(base_url, "onet_task_mappings.csv"))
# Automation vs. augmentation breakdown
auto_augment <- read_csv(paste0(base_url, "automation_vs_augmentation.csv"))
# Wage and occupation metadata
wages <- read_csv(paste0(base_url, "wage_data.csv"))The task statements file contains the core analytical unit: detailed descriptions of what workers do in each occupation, drawn from the O*NET database.
task_statements
#> # A tibble: 19,530 x 8
#> `O*NET-SOC Code` Title `Task ID` Task `Task Type`
#> <chr> <chr> <dbl> <chr> <chr>
#> 1 11-1011.00 Chief Executives 8823 Direct or co~ Core
#> 2 11-1011.00 Chief Executives 8831 Appoint depa~ Core
#> 3 11-1011.00 Chief Executives 8825 Analyze oper~ Core
#> ...Semantic Embeddings of Occupational Tasks
A core capability of huggingfaceR is converting text into dense vector representations. By embedding O*NET task descriptions, we can measure the semantic distance between any pair of tasks regardless of their surface wording. This enables similarity search, clustering, and dimensionality reduction across the full task taxonomy.
Embedding Task Descriptions
Select a representative sample of tasks and compute their embeddings.
# Join task descriptions with their AI usage rates.
# The task_mappings file uses lowercase task names, so we normalize case for
# the join key.
tasks_with_usage <- task_statements |>
select(task_id = `Task ID`, task = Task, title = Title,
soc_code = `O*NET-SOC Code`, task_type = `Task Type`) |>
mutate(task_lower = tolower(task)) |>
inner_join(
task_usage |> mutate(task_lower = tolower(task_name)),
by = "task_lower"
) |>
select(-task_lower, -task_name) |>
rename(ai_usage_pct = pct)
# Sample tasks across the usage distribution for analysis
set.seed(42)
sample_tasks <- tasks_with_usage |>
mutate(usage_quartile = ntile(ai_usage_pct, 4)) |>
group_by(usage_quartile) |>
slice_sample(n = 50) |>
ungroup()
# Generate embeddings for each task description
task_embeddings <- hf_embed(sample_tasks$task)The result is a tibble with one row per task, a list-column of numeric vectors, and the embedding dimensionality.
task_embeddings
#> # A tibble: 200 x 3
#> text embedding n_dims
#> <chr> <list> <int>
#> 1 Direct or coordinate an organization's fin~ <dbl [384]> 384
#> 2 Develop or implement procedures for food s~ <dbl [384]> 384
#> ...Measuring Task Similarity
With embeddings in hand, we can compute pairwise cosine similarity. This reveals which tasks are semantically related, even when they belong to different occupational categories.
# Compare a subset of tasks
analytical_tasks <- task_embeddings |>
slice(1:10)
hf_similarity(analytical_tasks)
#> # A tibble: 45 x 3
#> text_1 text_2 similarity
#> <chr> <chr> <dbl>
#> 1 Direct or coordinate an org~ Analyze operations to eval~ 0.82
#> 2 Direct or coordinate an org~ Develop or implement proce~ 0.45
#> ...Nearest Neighbor Search for Research Concepts
AI productivity researchers often want to identify which occupational
tasks are closest to abstract concepts such as “creative problem
solving” or “routine data entry.” The
hf_nearest_neighbors() function performs this semantic
search against an embedded corpus.
# Build an embedded document set using the tidytext-style interface
task_docs <- sample_tasks |>
select(task, ai_usage_pct, title) |>
hf_embed_text(task)
# Find tasks most similar to "writing and editing documents"
hf_nearest_neighbors(task_docs, "writing and editing documents", k = 5)
#> # A tibble: 5 x 5
#> task ai_usage_pct title embedding similarity
#> <chr> <dbl> <chr> <list> <dbl>
#> 1 Write reports, memos, or othe~ 0.38 Technical W~ <dbl> 0.91
#> ...
# Find tasks most similar to "quantitative data analysis"
hf_nearest_neighbors(task_docs, "quantitative data analysis", k = 5)
# Find tasks most similar to "interpersonal communication"
hf_nearest_neighbors(task_docs, "interpersonal communication", k = 5)This approach lets researchers map their theoretical constructs onto the empirical task taxonomy without manual coding. Compare this to a conversational approach with ellmer, where you would need to prompt an LLM with each of 19,000+ task descriptions individually. huggingfaceR processes the entire corpus as a single batch operation.
Clustering Tasks by Semantic Content
Beyond pairwise comparisons, researchers may want to discover latent
groupings in the task space. The hf_cluster_texts()
function applies k-means clustering on the embedding vectors to identify
coherent task families.
# Cluster tasks into semantic groups
clustered_tasks <- hf_cluster_texts(task_docs, k = 6)
clustered_tasks |>
group_by(cluster) |>
summarize(
n_tasks = n(),
mean_ai_usage = mean(ai_usage_pct, na.rm = TRUE),
example_task = first(task)
) |>
arrange(desc(mean_ai_usage))
#> # A tibble: 6 x 4
#> cluster n_tasks mean_ai_usage example_task
#> <int> <int> <dbl> <chr>
#> 1 3 28 0.312 Write and edit software code~
#> 2 1 38 0.189 Prepare reports summarizing~
#> ...Extracting Cluster Topics
To interpret the clusters, hf_extract_topics()
identifies the most representative terms within each group. This
function clusters the data internally and extracts frequent terms per
cluster.
task_docs |>
hf_extract_topics(text_col = "task", k = 6)
#> # A tibble: 6 x 2
#> cluster topic_terms
#> <int> <chr>
#> 1 1 prepare, reports, data, financial, ...
#> 2 2 develop, programs, training, ...
#> ...This unsupervised analysis may reveal that tasks with high AI usage cluster around writing, analysis, and code, while low-usage tasks cluster around physical operation, patient care, and equipment maintenance.
Zero-Shot Classification of Tasks
For hypothesis-driven research, you may want to classify tasks along
specific dimensions without training a supervised model. huggingfaceR’s
hf_classify_zero_shot() applies a natural language
inference model to assign labels based on textual entailment.
Cognitive Demand Classification
# Classify a sample of tasks by cognitive demand level
cognitive_labels <- c(
"routine procedural work",
"analytical reasoning",
"creative problem solving",
"interpersonal judgment"
)
high_usage_tasks <- sample_tasks |>
filter(ai_usage_pct > quantile(ai_usage_pct, 0.75)) |>
pull(task)
cognitive_classes <- hf_classify_zero_shot(
high_usage_tasks[1:20],
labels = cognitive_labels
)
cognitive_classes |>
group_by(text) |>
slice_max(score, n = 1) |>
ungroup() |>
count(label, sort = TRUE)
#> # A tibble: 4 x 2
#> label n
#> <chr> <int>
#> 1 analytical reasoning 9
#> 2 creative problem solving 6
#> 3 routine procedural work 3
#> 4 interpersonal judgment 2Automation Potential Classification
# Classify tasks by automation potential
automation_labels <- c(
"fully automatable by AI",
"partially automatable with human oversight",
"requires significant human judgment",
"cannot be performed by AI"
)
automation_classes <- hf_classify_zero_shot(
sample_tasks$task[1:30],
labels = automation_labels
)
# Compare zero-shot predictions with actual AI usage rates
automation_summary <- automation_classes |>
group_by(text) |>
slice_max(score, n = 1) |>
ungroup() |>
left_join(
sample_tasks |> select(task, ai_usage_pct),
by = c("text" = "task")
)
automation_summary |>
group_by(label) |>
summarize(
n = n(),
mean_actual_usage = mean(ai_usage_pct, na.rm = TRUE),
.groups = "drop"
) |>
arrange(desc(mean_actual_usage))This analysis tests whether a zero-shot model’s assessment of automation potential correlates with observed AI usage patterns in the real world.
Visualizing the Task Embedding Space
Dimensionality reduction provides a visual summary of how tasks
relate to each other in semantic space. Since we already computed
embeddings in task_docs, we can project them to 2D using
the uwot package directly, avoiding redundant API
calls.
library(uwot)
# Extract the embedding matrix from pre-computed embeddings
emb_matrix <- do.call(rbind, task_docs$embedding)
# Project to 2D with UMAP
umap_coords <- umap(emb_matrix, n_neighbors = 15, min_dist = 0.1)
# Build plot data
plot_data <- task_docs |>
mutate(
umap_1 = umap_coords[, 1],
umap_2 = umap_coords[, 2]
)
ggplot(plot_data, aes(x = umap_1, y = umap_2, color = ai_usage_pct)) +
geom_point(alpha = 0.7, size = 2) +
scale_color_viridis_c(
name = "AI Usage %",
labels = scales::percent_format(scale = 100)
) +
labs(
title = "Semantic Map of O*NET Tasks by AI Usage",
subtitle = "UMAP projection of task embeddings colored by AI adoption rate",
x = NULL, y = NULL
) +
theme_minimal() +
theme(
axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank()
)Tasks that cluster together in this projection share semantic content. Color gradients within clusters indicate differential AI adoption among semantically similar tasks, which may point to factors beyond task content (such as industry norms or tool availability) that influence adoption.
For quick one-off visualizations without pre-computed embeddings, you
can use hf_embed_umap() which handles embedding and
projection in a single call:
# Alternative: hf_embed_umap() generates embeddings and projects in one step
hf_embed_umap(sample_tasks$task[1:50])Visualizing Clusters
ggplot(plot_data |> left_join(clustered_tasks |> select(task, cluster), by = "task"),
aes(x = umap_1, y = umap_2, color = factor(cluster))) +
geom_point(alpha = 0.7, size = 2) +
labs(
title = "Task Clusters in Embedding Space",
subtitle = "K-means clusters projected via UMAP",
color = "Cluster",
x = NULL, y = NULL
) +
theme_minimal() +
theme(
axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank()
)Linking AI Usage to Wage Data
The AEI dataset includes occupational wage data, enabling researchers to examine the relationship between AI adoption and compensation.
# Prepare wage data
occupation_wages <- wages |>
filter(MedianSalary > 0, ChanceAuto >= 0) |>
select(soc_code = SOCcode, job_name = JobName, job_family = JobFamily,
median_salary = MedianSalary, chance_auto = ChanceAuto,
job_zone = JobZone)
# Aggregate AI usage by occupation
occupation_usage <- tasks_with_usage |>
group_by(soc_code, title) |>
summarize(
mean_ai_usage = mean(ai_usage_pct, na.rm = TRUE),
n_tasks = n(),
.groups = "drop"
)
# Join
occupation_analysis <- occupation_usage |>
inner_join(occupation_wages, by = "soc_code")
ggplot(occupation_analysis, aes(x = median_salary, y = mean_ai_usage)) +
geom_point(alpha = 0.4) +
geom_smooth(method = "loess", se = TRUE) +
scale_x_continuous(labels = scales::dollar_format()) +
scale_y_continuous(labels = scales::percent_format(scale = 100)) +
labs(
title = "AI Usage by Median Salary",
x = "Median Annual Salary",
y = "Mean AI Usage Rate (across tasks)"
) +
theme_minimal()Analyzing Collaboration Patterns
The AEI categorizes human-AI interactions into distinct collaboration patterns: directive (human tells AI what to produce), feedback loop (iterative refinement), learning (human seeks understanding), task iteration (human builds on AI output), and validation (human checks AI work). These map onto the broader distinction between automation (directive, feedback loop) and augmentation (learning, task iteration, validation).
auto_augment
#> # A tibble: 6 x 2
#> interaction_type pct
#> <chr> <dbl>
#> 1 directive 22.6
#> 2 feedback loop 12.0
#> 3 learning 18.9
#> 4 none 2.9
#> 5 task iteration 25.5
#> 6 validation 2.3We can use zero-shot classification to validate whether these interaction categories align with the semantic content of the tasks they are associated with.
interaction_labels <- c(
"giving direct instructions",
"iterative refinement and feedback",
"learning and understanding",
"building upon previous output",
"checking and validating work"
)
# Classify a sample of task descriptions against interaction patterns
interaction_classes <- hf_classify_zero_shot(
sample_tasks$task[1:20],
labels = interaction_labels,
multi_label = TRUE
)Geographic Analysis with the v3 Release
The September 2025 release adds geographic breakdowns of AI usage. We can load this enriched data to explore how AI adoption varies across countries and U.S. states.
v3_url <- paste0(
"https://huggingface.co/datasets/Anthropic/EconomicIndex/",
"resolve/main/release_2025_09_15/data/output/"
)
# Load enriched Claude.ai geographic data
geo_data <- read_csv(
paste0(v3_url, "aei_enriched_claude_ai_2025-08-04_to_2025-08-11.csv")
)
# Filter to country-level usage metrics
country_usage <- geo_data |>
filter(
geography == "country",
facet == "onet_task",
variable == "onet_task_pct",
level == 0
) |>
select(geo_id, cluster_name, value)
# Identify top tasks per country
top_tasks_by_country <- country_usage |>
group_by(geo_id) |>
slice_max(value, n = 5) |>
ungroup()Embedding Geographic Task Profiles
Each country has a distribution of AI usage across tasks. We can characterize national AI strategies by embedding the top tasks for each country and computing inter-country similarity.
# Get unique tasks across top country profiles
unique_geo_tasks <- top_tasks_by_country |>
distinct(cluster_name) |>
pull(cluster_name)
geo_task_embeddings <- hf_embed(unique_geo_tasks)
# For each country, compute a weighted average embedding
# representing its AI usage profile
country_profiles <- top_tasks_by_country |>
left_join(
geo_task_embeddings |> select(text, embedding),
by = c("cluster_name" = "text")
)Comparison with Conversational Approaches
The analyses above illustrate capabilities that distinguish huggingfaceR from conversational LLM packages like ellmer. Consider the research question: “Which occupational tasks are most semantically similar to creative writing, and how does their AI usage compare?”
The huggingfaceR approach (programmatic, reproducible)
# Embed all 19,000+ task descriptions in batch
all_embeddings <- tasks_with_usage |>
hf_embed_text(task)
# Find the 20 nearest neighbors to "creative writing"
creative_tasks <- hf_nearest_neighbors(all_embeddings, "creative writing", k = 20)
# Analyze their usage distribution
creative_tasks |>
summarize(
mean_usage = mean(ai_usage_pct),
median_usage = median(ai_usage_pct),
sd_usage = sd(ai_usage_pct)
)The conversational approach (ellmer or similar)
With a chat-based interface, the same analysis would require:
- Manually prompting the LLM with each task description to assess similarity (19,000+ API calls with unstructured text responses)
- Parsing natural language responses into numeric similarity scores
- Handling rate limits, inconsistent outputs, and non-deterministic responses
- No guarantee of reproducibility across runs
huggingfaceR’s embedding-based approach is deterministic, operates in batch, and produces structured numeric output suitable for downstream statistical analysis. The entire pipeline runs as a single reproducible script.
Research Applications
The combination of the Anthropic Economic Index with huggingfaceR’s analytical tools supports several research directions:
Task-level adoption modeling. Use embeddings as features in regression models predicting AI usage rates, controlling for occupation, wages, and task characteristics.
Semantic distance and diffusion. Measure whether AI adoption spreads to semantically adjacent tasks over time, using longitudinal AEI releases.
Skill taxonomy validation. Test whether unsupervised clusters from embeddings align with established occupational classification systems (SOC major groups).
Cross-national specialization. Compare the embedding centroids of top tasks across countries to characterize national AI usage profiles.
Automation boundary detection. Use zero-shot classification and similarity search to identify the semantic frontier between automatable and non-automatable tasks.
Summary
This vignette demonstrated how huggingfaceR enables programmatic analysis of the Anthropic Economic Index:
| Function | Research Application |
|---|---|
hf_embed() |
Convert task descriptions to vector representations |
hf_similarity() |
Measure semantic relatedness between tasks |
hf_nearest_neighbors() |
Map research concepts onto the task taxonomy |
hf_cluster_texts() |
Discover latent task groupings |
hf_extract_topics() |
Interpret cluster content |
hf_classify_zero_shot() |
Classify tasks along arbitrary dimensions |
hf_embed_umap() |
Visualize the task embedding space |
These operations run as reproducible batch pipelines over structured data, producing tibbles suitable for statistical modeling and visualization. This analytical approach complements conversational tools by enabling the kind of corpus-scale, quantitative research that individual prompts cannot support.
See Also
- Getting Started – installation and authentication.
- Embeddings, Similarity, and Semantic Search – detailed coverage of embedding functions.
- Text Classification – zero-shot classification techniques.
- Hub Discovery, Datasets, and Tidymodels – searching the Hub and building ML pipelines.
- Anthropic Economic Index – official reports and methodology.