Introduction
The onet2r package provides access to the O*NET Web Services API, which contains comprehensive U.S. occupational data including skills, knowledge requirements, tasks, abilities, and technology skills for thousands of occupations. By combining onet2r with foundryR’s AI capabilities, you can unlock powerful analytical workflows: - Semantic search for finding occupations by meaning, not just keywords - Occupation clustering based on skill and task similarities - AI-powered analysis of occupational data and career pathways - Enhanced job matching using embedding similarity
This vignette demonstrates practical integrations between these two packages. ## Prerequisites
Install both packages:
# Install pak if you don't have it
# install.packages("pak")
pak::pak("farach/foundryR")
pak::pak("farach/onet2r")Configure your API credentials:
library(foundryR)
library(onet2r) # install from GitHub: pak::pak("farach/onet2r")
library(dplyr)
library(tidyr)
# Set Azure AI Foundry credentials
foundry_set_endpoint("https://your-resource.openai.azure.com")
foundry_set_key("your-api-key")
# onet2r will use credentials from environment variables:
# ONET_USERNAME and ONET_PASSWORDUnderstanding O*NET Data
O*NET (Occupational Information Network) is the primary source of occupational information in the United States. The onet2r package provides access to rich data about occupations:
# Search for occupations by keyword
software_jobs <- onet_search("software developer")
software_jobs
#> # A tibble: 10 x 3
#> code title relevance_score
#> <chr> <chr> <dbl>
#> 1 15-1252.00 Software Developers 100
#> 2 15-1253.00 Software Quality Assurance Analysts ... 79.3
#> 3 15-1254.00 Web Developers 68.5
#> ...
# Get detailed information for a specific occupation
dev_info <- onet_occupation("15-1252.00")
dev_info
#> # A tibble: 1 x 4
#> code title description sample_of_rep...
#> <chr> <chr> <chr> <list>
#> 1 15-1252.00 Software Developers Research, design, and develop... <chr [5]>
# Get skills required for software developers
dev_skills <- onet_skills("15-1252.00")
dev_skills
#> # A tibble: 35 x 5
#> element_id element_name scale_id scale_name data_value
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 2.A.1.a Reading Co... LV Level 4.62
#> 2 2.A.1.a Reading Co... IM Importance 4.25
#> 3 2.A.1.b Active Lis... LV Level 4.38
#> ...Use Case 1: Semantic Search for Occupations
O*NET’s onet_search() uses keyword matching, but with
foundryR embeddings you can find occupations by semantic meaning. This
is especially useful when job seekers use different terminology than
official occupation titles.
Building an Occupation Search Index
# Get all occupations (or a subset for demonstration)
all_occupations <- onet_occupations(end = 100) # First 100 for demo
# Create rich text descriptions combining title and description
occupation_texts <- all_occupations %>%
mutate(
search_text = paste(title, "-", description)
)
# Generate embeddings for all occupations
occupation_embeddings <- foundry_embed(
occupation_texts$search_text,
model = "text-embedding-3-small"
)
# Combine with occupation metadata
occupation_index <- occupation_texts %>%
bind_cols(
occupation_embeddings %>% select(embedding)
)Semantic Occupation Search
# Helper function to compute cosine similarity
cosine_similarity <- function(a, b) {
sum(a * b) / (sqrt(sum(a^2)) * sqrt(sum(b^2)))
}
# User query using natural language
query <- "I want to help people with their mental health and emotional problems"
query_embedding <- foundry_embed(query, model = "text-embedding-3-small")
query_vec <- query_embedding$embedding[[1]]
# Find most similar occupations
search_results <- occupation_index %>%
mutate(
similarity = map_dbl(embedding, ~cosine_similarity(query_vec, .x))
) %>%
arrange(desc(similarity)) %>%
select(code, title, similarity)
head(search_results, 5)
#> # A tibble: 5 x 3
#> code title similarity
#> <chr> <chr> <dbl>
#> 1 21-1014.00 Mental Health Counselors 0.892
#> 2 21-1013.00 Marriage and Family Therapists 0.867
#> 3 21-1023.00 Mental Health and Substance Abuse S... 0.854
#> 4 19-3033.00 Clinical and Counseling Psychologists 0.841
#> 5 21-1011.00 Substance Abuse and Behavioral Dis... 0.823Notice how the semantic search found mental health-related occupations even though the query did not use exact occupation titles.
Comparing Keyword vs Semantic Search
# Keyword search
keyword_results <- onet_search("work with computers and solve problems")
keyword_results
#> # A tibble: 0 x 3
#> # ... with 3 variables: code <chr>, title <chr>, relevance_score <dbl>
# Semantic search with the same query
semantic_query <- "work with computers and solve problems"
semantic_embedding <- foundry_embed(semantic_query, model = "text-embedding-3-small")
semantic_vec <- semantic_embedding$embedding[[1]]
semantic_results <- occupation_index %>%
mutate(
similarity = map_dbl(embedding, ~cosine_similarity(semantic_vec, .x))
) %>%
arrange(desc(similarity)) %>%
select(code, title, similarity) %>%
head(5)
semantic_results
#> # A tibble: 5 x 3
#> code title similarity
#> <chr> <chr> <dbl>
#> 1 15-1252.00 Software Developers 0.845
#> 2 15-1211.00 Computer Systems Analysts 0.832
#> 3 15-1299.08 Computer Systems Engineers/Architects 0.821
#> 4 15-1232.00 Computer User Support Specialists 0.798
#> 5 15-1244.00 Network and Computer Systems Admin... 0.789Semantic search succeeds where keyword search fails, understanding the intent behind the query.
Use Case 2: Clustering Occupations by Skills
Using embeddings on occupation skill profiles enables data-driven clustering of similar occupations.
# Get skills for multiple occupations
occupation_codes <- c(
"15-1252.00", # Software Developers
"15-1211.00", # Computer Systems Analysts
"29-1141.00", # Registered Nurses
"29-1171.00", # Nurse Practitioners
"11-1021.00", # General Managers
"11-2021.00", # Marketing Managers
"25-1011.00", # Business Teachers
"25-1021.00" # Computer Science Teachers
)
# Fetch skills for each occupation
get_skill_profile <- function(code) {
skills <- onet_skills(code)
skills %>%
filter(scale_id == "IM") %>% # Importance scale
arrange(desc(data_value)) %>%
head(10) %>%
summarize(
code = first(code),
skill_profile = paste(element_name, collapse = "; ")
)
}
skill_profiles <- map_dfr(occupation_codes, get_skill_profile)
# Add occupation titles
skill_profiles <- skill_profiles %>%
left_join(
map_dfr(occupation_codes, onet_occupation) %>%
select(code, title),
by = "code"
)
# Generate embeddings from skill profiles
skill_embeddings <- foundry_embed(
skill_profiles$skill_profile,
model = "text-embedding-3-small"
)
# Cluster occupations
embedding_matrix <- do.call(rbind, skill_embeddings$embedding)
set.seed(42)
clusters <- kmeans(embedding_matrix, centers = 3, nstart = 10)
# View clustering results
skill_profiles %>%
mutate(cluster = clusters$cluster) %>%
select(title, cluster) %>%
arrange(cluster)
#> # A tibble: 8 x 2
#> title cluster
#> <chr> <int>
#> 1 Software Developers 1
#> 2 Computer Systems Analysts 1
#> 3 Computer Science Teachers 1
#> 4 Registered Nurses 2
#> 5 Nurse Practitioners 2
#> 6 General and Operations Managers 3
#> 7 Marketing Managers 3
#> 8 Business Teachers, Postsecon... 3The embeddings correctly group occupations by their skill similarity: tech roles, healthcare roles, and business/management roles.
Use Case 3: AI-Powered Career Analysis
Use foundry_chat() to analyze occupational data and
provide career guidance.
Summarizing Occupation Requirements
# Gather comprehensive data for an occupation
occupation_code <- "15-1252.00" # Software Developers
occ_info <- onet_occupation(occupation_code)
occ_skills <- onet_skills(occupation_code)
occ_knowledge <- onet_knowledge(occupation_code)
occ_tasks <- onet_tasks(occupation_code)
occ_education <- onet_education(occupation_code)
# Format data for the LLM
occupation_data <- paste(
"Occupation:", occ_info$title,
"\n\nDescription:", occ_info$description,
"\n\nTop Skills (by importance):",
occ_skills %>%
filter(scale_id == "IM") %>%
arrange(desc(data_value)) %>%
head(10) %>%
pull(element_name) %>%
paste(collapse = ", "),
"\n\nTop Knowledge Areas:",
occ_knowledge %>%
filter(scale_id == "IM") %>%
arrange(desc(data_value)) %>%
head(10) %>%
pull(element_name) %>%
paste(collapse = ", "),
"\n\nSample Tasks:",
occ_tasks %>%
head(5) %>%
pull(task) %>%
paste(collapse = "; "),
"\n\nEducation Requirements:",
occ_education %>%
arrange(desc(data_value)) %>%
head(3) %>%
mutate(info = paste(category, "-", data_value, "%")) %>%
pull(info) %>%
paste(collapse = ", ")
)
# Ask the LLM to provide career guidance
response <- foundry_chat(
prompt = occupation_data,
system = "You are a career counselor. Based on the occupation data provided,
give a brief summary suitable for someone considering this career. Include:
1. What the job involves day-to-day
2. Key skills to develop
3. Typical education path
4. Career outlook and growth potential
Keep your response concise (under 200 words).",
model = "gpt-4o-mini"
)
cat(response$content)
#> Software Developers create applications and systems that power our digital world.
#> Day-to-day, you'll analyze user needs, design software solutions, write and test
#> code, and collaborate with teams to deliver quality products.
#>
#> Key skills to develop include programming (multiple languages), critical thinking,
#> complex problem solving, and strong communication abilities. You'll also need
#> solid foundations in computers and electronics, mathematics, and engineering
#> principles.
#>
#> Most positions require a bachelor's degree in Computer Science, Software
#> Engineering, or a related field, though some roles accept candidates with
#> coding bootcamp certificates and strong portfolios.
#>
#> The career outlook is excellent, with continued demand driven by expanding
#> technology needs across all industries. Opportunities for advancement include
#> senior developer, architect, and management positions.Comparing Two Occupations
# Gather data for two occupations to compare
compare_occupations <- function(code1, code2) {
occ1 <- onet_occupation(code1)
occ2 <- onet_occupation(code2)
skills1 <- onet_skills(code1) %>%
filter(scale_id == "IM") %>%
arrange(desc(data_value)) %>%
head(10)
skills2 <- onet_skills(code2) %>%
filter(scale_id == "IM") %>%
arrange(desc(data_value)) %>%
head(10)
comparison_data <- paste(
"Occupation 1:", occ1$title,
"\nDescription:", occ1$description,
"\nTop Skills:", paste(skills1$element_name, collapse = ", "),
"\n\nOccupation 2:", occ2$title,
"\nDescription:", occ2$description,
"\nTop Skills:", paste(skills2$element_name, collapse = ", ")
)
response <- foundry_chat(
prompt = comparison_data,
system = "You are a career advisor. Compare these two occupations:
1. Highlight similarities and differences
2. Identify transferable skills between them
3. Suggest which type of person might prefer each
Be concise and practical.",
model = "gpt-4o-mini"
)
response$content
}
# Compare Data Scientist vs Software Developer
comparison <- compare_occupations("15-2051.00", "15-1252.00")
cat(comparison)Use Case 4: Finding Career Transition Paths
Identify occupations that share skills with a current role, suggesting potential career transitions.
# Current occupation
current_code <- "15-1252.00" # Software Developers
current_skills <- onet_skills(current_code) %>%
filter(scale_id == "IM") %>%
arrange(desc(data_value)) %>%
head(15)
# Create a skill-based query
skill_query <- paste(
"Professional with expertise in:",
paste(current_skills$element_name, collapse = ", ")
)
# Embed the skill query
skill_embedding <- foundry_embed(skill_query, model = "text-embedding-3-small")
skill_vec <- skill_embedding$embedding[[1]]
# Find similar occupations (excluding current)
transition_candidates <- occupation_index %>%
filter(code != current_code) %>%
mutate(
similarity = map_dbl(embedding, ~cosine_similarity(skill_vec, .x))
) %>%
arrange(desc(similarity)) %>%
select(code, title, similarity) %>%
head(10)
transition_candidates
#> # A tibble: 10 x 3
#> code title similarity
#> <chr> <chr> <dbl>
#> 1 15-1211.00 Computer Systems Analysts 0.912
#> 2 15-1299.08 Computer Systems Engineers/Architects 0.897
#> 3 15-2051.00 Data Scientists 0.884
#> 4 15-1253.00 Software Quality Assurance Analysts 0.876
#> 5 15-1243.00 Database Architects 0.865
#> 6 11-3021.00 Computer and Information Systems Man... 0.843
#> 7 15-1244.00 Network and Computer Systems Adminis... 0.831
#> 8 17-2061.00 Computer Hardware Engineers 0.819
#> 9 15-1212.00 Information Security Analysts 0.812
#> 10 25-1021.00 Computer Science Teachers, Postsecon... 0.798Analyzing Skill Gaps for Transitions
analyze_skill_gap <- function(from_code, to_code) {
from_skills <- onet_skills(from_code) %>%
filter(scale_id == "IM") %>%
select(element_name, from_importance = data_value)
to_skills <- onet_skills(to_code) %>%
filter(scale_id == "IM") %>%
select(element_name, to_importance = data_value)
gap_analysis <- full_join(from_skills, to_skills, by = "element_name") %>%
mutate(
from_importance = replace_na(from_importance, 0),
to_importance = replace_na(to_importance, 0),
gap = to_importance - from_importance
) %>%
arrange(desc(gap))
# Skills needed for the new role
skills_to_develop <- gap_analysis %>%
filter(gap > 0.5) %>%
head(5)
# Existing strengths
transferable <- gap_analysis %>%
filter(from_importance >= 3.5, to_importance >= 3.5) %>%
head(5)
list(
skills_to_develop = skills_to_develop,
transferable_skills = transferable
)
}
# Analyze gap from Software Developer to Data Scientist
gap <- analyze_skill_gap("15-1252.00", "15-2051.00")
cat("Skills to Develop:\n")
print(gap$skills_to_develop %>% select(element_name, gap))
cat("\nTransferable Skills:\n
")
print(gap$transferable_skills %>% select(element_name, from_importance, to_importance))Use Case 5: Hot Technology Analysis
Analyze emerging technologies across occupations using O*NET’s hot technology data.
# Get hot technologies for tech-related occupations
tech_occupations <- c(
"15-1252.00", # Software Developers
"15-2051.00", # Data Scientists
"15-1211.00", # Computer Systems Analysts
"15-1212.00" # Information Security Analysts
)
# Gather hot technologies
all_hot_tech <- map_dfr(tech_occupations, function(code) {
onet_hot_technology(code) %>%
mutate(occupation_code = code)
})
# Combine with occupation titles
all_hot_tech <- all_hot_tech %>%
left_join(
map_dfr(tech_occupations, onet_occupation) %>%
select(code, title),
by = c("occupation_code" = "code")
)
# Ask LLM to analyze technology trends
tech_summary <- all_hot_tech %>%
group_by(title) %>%
summarize(technologies = paste(hot_technology, collapse = ", ")) %>%
mutate(summary = paste(title, ":", technologies)) %>%
pull(summary) %>%
paste(collapse = "\n\n")
response <- foundry_chat(
prompt = tech_summary,
system = "You are a technology career advisor. Based on the hot technologies
listed for each occupation:
1. Identify common technologies across roles
2. Note technologies unique to specific roles
3. Suggest which technologies are most valuable to learn for career flexibility
Be specific and actionable.",
model = "gpt-4o-mini"
)
cat(response$content)Use Case 6: Building a Job Recommendation System
Combine user preferences with occupation embeddings for personalized job recommendations.
# User profile (could come from a survey or assessment)
user_profile <- list(
interests = "I enjoy analyzing data, solving complex problems, and working
with technology. I prefer collaborative work environments.",
skills = "Programming in Python and R, statistical analysis, data visualization,
machine learning basics, SQL databases",
education = "Master's degree in Statistics",
preferences = "Work-life balance, remote work options, continuous learning"
)
# Create a comprehensive user query
user_query <- paste(
"Professional profile:",
"Interests:", user_profile$interests,
"Skills:", user_profile$skills,
"Education:", user_profile$education,
"Work preferences:", user_profile$preferences
)
# Generate embedding for user profile
user_embedding <- foundry_embed(user_query, model = "text-embedding-3-small")
user_vec <- user_embedding$embedding[[1]]
# Find matching occupations
recommendations <- occupation_index %>%
mutate(
similarity = map_dbl(embedding, ~cosine_similarity(user_vec, .x))
) %>%
arrange(desc(similarity)) %>%
select(code, title, similarity) %>%
head(10)
# Enhance with AI explanation
top_matches <- recommendations %>% head(5)
for (i in seq_len(nrow(top_matches))) {
occ_info <- onet_occupation(top_matches$code[i])
explanation <- foundry_chat(
prompt = paste(
"User profile:", user_query,
"\n\nRecommended occupation:", occ_info$title,
"\nDescription:", occ_info$description
),
system = "In 2-3 sentences, explain why this occupation is a good match
for this user's profile. Be specific about the alignment.",
model = "gpt-4o-mini"
)
cat(paste0("\n", i, ". ", top_matches$title[i],
" (Match: ", round(top_matches$similarity[i] * 100), "%)\n"))
cat(explanation$content, "\n")
}Performance Tips
Caching Embeddings
For production applications, cache occupation embeddings to avoid repeated API calls:
Batch Processing
When processing many occupations, use batch embedding:
# Collect all texts first
texts_to_embed <- occupation_texts$search_text
# Embed in batches to manage API limits
batch_size <- 50
all_embeddings <- list()
for (i in seq(1, length(texts_to_embed), by = batch_size)) {
batch_end <- min(i + batch_size - 1, length(texts_to_embed))
batch_texts <- texts_to_embed[i:batch_end]
batch_embeddings <- foundry_embed(batch_texts, model = "text-embedding-3-small")
all_embeddings[[length(all_embeddings) + 1]] <- batch_embeddings
# Respect rate limits
Sys.sleep(1)
}
# Combine results
final_embeddings <- bind_rows(all_embeddings)Reducing Dimensions
For large-scale applications, consider reducing embedding dimensions:
# Use smaller dimensions for faster similarity computation
compact_embeddings <- foundry_embed(
occupation_texts$search_text,
model = "text-embedding-3-small",
dimensions = 256 # Instead of default 1536
)Next Steps
- Learn more about Text Embeddings in foundryR
- Explore tidymodels Integration for building ML pipelines
- Visit the onet2r documentation for complete API coverage
- Check out the O*NET Resource Center for data definitions