Longitudinal O*NET Archive Analysis • onet2r

Read vignette("longitudinal-onet-background", package = "onet2r") before interpreting changes. O*NET was not designed as a longitudinal panel, so the practical job is not only to compute differences. It is to separate real descriptor updates from carryforwards, re-surveys with stable values, recodes, transition rows, and taxonomy seams.

This walkthrough uses packaged archive-format text files so every displayed table comes from onet2r functions while the vignette remains CRAN-safe and does not require network access.

A Practical Question

Suppose you want to know whether selected cognitive ability ratings changed between O*NET releases for occupations that matter in workforce planning: software developers, registered nurses, chief executives, and retail supervisors. A naive new - old difference is not enough, because unchanged values may be carryforwards and changed values may reflect a recode rather than a fresh survey update.

Read One Archive Table

onet_archive_read() reads an O*NET text archive table and normalizes it into one panel schema.

abilities_303 <- onet_archive_read(
  "30.3",
  "Abilities",
  path = example_archive("30.3"),
  release_date = "2026-05-01"
)

abilities_303 |>
  select(
    release_version,
    onet_soc_code,
    soc_code,
    element_id,
    element_name,
    data_value,
    source_date,
    domain_source
  ) |>
  head(8) |>
  onet_kable()

release_version	onet_soc_code	soc_code	element_id	element_name	data_value	source_date	domain_source
30.3	15-1252.00	15-1252	1.A.1.a.1	Oral Comprehension	4.35	2025-07-01	Analyst
30.3	15-1252.00	15-1252	1.A.1.b.1	Problem Sensitivity	4.50	2024-07-01	Analyst
30.3	29-1141.00	29-1141	1.A.1.a.1	Oral Comprehension	4.71	2025-08-01	Incumbent
30.3	29-1141.00	29-1141	1.A.1.b.1	Problem Sensitivity	4.90	2024-08-01	Incumbent
30.3	11-1011.00	11-1011	1.A.1.a.1	Oral Comprehension	4.50	2025-07-01	Incumbent
30.3	11-1011.00	11-1011	1.A.1.b.1	Problem Sensitivity	4.22	2024-07-01	Analyst
30.3	41-1011.00	41-1011	1.A.1.a.1	Oral Comprehension	4.15	2025-06-01	Analyst

The output has one row per occupation, element, scale, and release. The important longitudinal fields are release_version, soc_vintage, onet_soc_code, data_value, source_date, and domain_source.

Assemble a Same-Vintage Panel

onet_panel() applies the same reader across releases and row-binds the normalized outputs.

same_vintage_archives <- c(
  `30.2` = example_archive("30.2"),
  `30.3` = example_archive("30.3")
)
same_vintage_dates <- c(`30.2` = "2026-02-01", `30.3` = "2026-05-01")

panel <- onet_panel(
  "Abilities",
  versions = c("30.2", "30.3"),
  scale = "IM",
  archives = same_vintage_archives,
  release_dates = same_vintage_dates
)

panel |>
  arrange(onet_soc_code, element_id, release_version) |>
  select(
    release_version,
    onet_soc_code,
    element_name,
    data_value,
    source_date,
    domain_source
  ) |>
  head(10) |>
  onet_kable()

release_version	onet_soc_code	element_name	data_value	source_date	domain_source
30.2	11-1011.00	Oral Comprehension	4.38	2024-07-01	Analyst
30.3	11-1011.00	Oral Comprehension	4.50	2025-07-01	Incumbent
30.2	11-1011.00	Problem Sensitivity	4.22	2024-07-01	Analyst
30.3	11-1011.00	Problem Sensitivity	4.22	2024-07-01	Analyst
30.2	15-1252.00	Oral Comprehension	4.12	2024-07-01	Analyst
30.3	15-1252.00	Oral Comprehension	4.35	2025-07-01	Analyst
30.2	15-1252.00	Problem Sensitivity	4.50	2024-07-01	Analyst
30.3	15-1252.00	Problem Sensitivity	4.50	2024-07-01	Analyst
30.2	29-1141.00	Oral Comprehension	4.71	2024-08-01	Incumbent
30.3	29-1141.00	Oral Comprehension	4.71	2025-08-01	Incumbent

This is the table you would save as your audit trail before doing any modeling. It records the exact versions, values, and source dates used in the comparison.

Reconcile Adjacent Releases

onet_panel_reconcile() compares adjacent releases and classifies each matched occupation-element-scale pair.

changes <- onet_panel_reconcile(panel, onet_crosswalk_bridge("2019", "2019"))

changes |>
  select(
    from_soc_code,
    to_soc_code,
    element_name,
    from_value,
    to_value,
    value_change,
    from_source_date,
    to_source_date,
    change_type,
    method_break,
    safely_comparable
  ) |>
  arrange(desc(abs(value_change))) |>
  head(10) |>
  onet_kable()

from_soc_code	to_soc_code	element_name	from_value	to_value	value_change	from_source_date	to_source_date	change_type	method_break	safely_comparable
29-1141	29-1141	Problem Sensitivity	4.60	4.90	0.30	2024-08-01	2024-08-01	recode_or_recalc_flag	FALSE	FALSE
15-1252	15-1252	Oral Comprehension	4.12	4.35	0.23	2024-07-01	2025-07-01	real_update	FALSE	TRUE
41-1011	41-1011	Oral Comprehension	4.00	4.15	0.15	2024-06-01	2025-06-01	real_update	FALSE	TRUE
11-1011	11-1011	Oral Comprehension	4.38	4.50	0.12	2024-07-01	2025-07-01	real_update	TRUE	FALSE
15-1252	15-1252	Problem Sensitivity	4.50	4.50	0.00	2024-07-01	2024-07-01	stale_carryforward	FALSE	TRUE
29-1141	29-1141	Oral Comprehension	4.71	4.71	0.00	2024-08-01	2025-08-01	resampled_stable	FALSE	TRUE
11-1011	11-1011	Problem Sensitivity	4.22	4.22	0.00	2024-07-01	2024-07-01	stale_carryforward	FALSE	TRUE

Read change_type before interpreting value_change.

Value changed?	Source date changed?	Classification	Interpretation
no	no	`stale_carryforward`	The release likely carried forward the prior value.
yes	yes	`real_update`	The value changed with a new source date.
no	yes	`resampled_stable`	The occupation appears updated, but the score stayed stable.
yes	no	`recode_or_recalc_flag`	Treat cautiously because the value changed without a new source date.

changes |>
  filter(value_changed) |>
  mutate(abs_change = abs(value_change)) |>
  arrange(safely_comparable, desc(abs_change)) |>
  select(
    to_soc_code,
    element_name,
    from_value,
    to_value,
    value_change,
    change_type,
    method_break,
    safely_comparable
  ) |>
  onet_kable()

to_soc_code	element_name	from_value	to_value	value_change	change_type	method_break	safely_comparable
29-1141	Problem Sensitivity	4.60	4.90	0.30	recode_or_recalc_flag	FALSE	FALSE
11-1011	Oral Comprehension	4.38	4.50	0.12	real_update	TRUE	FALSE
15-1252	Oral Comprehension	4.12	4.35	0.23	real_update	FALSE	TRUE
41-1011	Oral Comprehension	4.00	4.15	0.15	real_update	FALSE	TRUE

Cross a Taxonomy Seam

Same-vintage comparisons are the easy case. The next example uses the bundled 2010-vintage and 2019-vintage fixtures. The bridge is intentionally tiny, but it shows the real problem: a 2010 occupation can split into more than one 2019 occupation.

cross_archives <- c(
  `24.3` = example_archive("24.3"),
  `25.1` = example_archive("25.1")
)
cross_dates <- c(`24.3` = "2020-08-01", `25.1` = "2020-11-01")

cross_panel <- onet_panel(
  "Abilities",
  versions = c("24.3", "25.1"),
  scale = "IM",
  archives = cross_archives,
  release_dates = cross_dates
)

bridge_2010_2019 <- tibble::tibble(
  from_vintage = "2010",
  to_vintage = "2019",
  from_onet_soc_code = c("15-1132.00", "15-1132.00", "29-1141.00"),
  to_onet_soc_code = c("15-1252.00", "15-1253.00", "29-1141.00"),
  map_type = c("split", "split", "one_to_one"),
  crosswalk_weight = c(0.5, 0.5, 1)
)

cross_changes <- onet_panel_reconcile(cross_panel, bridge_2010_2019)

cross_changes |>
  select(
    from_onet_soc_code,
    to_onet_soc_code,
    element_name,
    change_type,
    crosswalk_uncertain,
    transition_data,
    safely_comparable
  ) |>
  onet_kable()

from_onet_soc_code	to_onet_soc_code	element_name	change_type	crosswalk_uncertain	transition_data	safely_comparable
15-1132.00	15-1252.00	Oral Comprehension	transition_data	TRUE	TRUE	FALSE
15-1132.00	15-1253.00	Oral Comprehension	transition_data	TRUE	TRUE	FALSE
29-1141.00	29-1141.00	Oral Comprehension	real_update	FALSE	FALSE	TRUE
15-1132.00	15-1252.00	Problem Sensitivity	dropped	TRUE	FALSE	FALSE
15-1132.00	15-1253.00	Problem Sensitivity	dropped	TRUE	FALSE	FALSE

plot_changes <- cross_changes |>
  filter(!is.na(from_value), !is.na(to_value)) |>
  mutate(
    comparison = paste(to_onet_soc_code, element_name, sep = ": "),
    comparability = if_else(safely_comparable, "Safe", "Not safe")
  )

ggplot2::ggplot(plot_changes, ggplot2::aes(y = comparison)) +
  ggplot2::geom_segment(
    ggplot2::aes(
      x = from_value,
      xend = to_value,
      yend = comparison,
      color = comparability
    ),
    linewidth = 1.1
  ) +
  ggplot2::geom_point(
    ggplot2::aes(x = from_value),
    color = onet2r_colors[["slate"]],
    size = 2.8
  ) +
  ggplot2::geom_point(
    ggplot2::aes(x = to_value, color = comparability),
    size = 3.2
  ) +
  onet2r_discrete_color(name = "Comparability") +
  ggplot2::labs(
    title = "A Cross-Vintage Difference Is Not Automatically Comparable",
    subtitle = "Start and end points come from the bundled archive fixtures.",
    x = "Importance rating",
    y = NULL
  ) +
  onet2r_theme()

Dumbbell chart of cross-vintage ability changes, with safe comparisons separated from unsafe comparisons.

Summarize the Panel

onet_change_summary() gives a compact audit of the reconciliation result. The job-family view is useful for spotting whether one part of the SOC taxonomy is driving the apparent change.

onet_change_summary(changes, by = "job_family") |>
  onet_kable()

summary_level	job_family	change_type	n_pairs	share_pairs	mean_value_change	median_abs_value_change	share_safely_comparable	share_method_break
overall	NA	real_update	7	1.000	0.114	0.120	0.714	0.143
job_family	11	stale_carryforward	2	0.286	0.060	0.060	0.500	0.500
job_family	15	stale_carryforward	2	0.286	0.115	0.115	1.000	0.000
job_family	29	resampled_stable	2	0.286	0.150	0.150	0.500	0.000
job_family	41	real_update	1	0.143	0.150	0.150	1.000	0.000

Recommended Workflow for Real Archive Work

Use onet_releases() to identify release versions and O*NET-SOC vintages.
Build a panel with onet_panel() for one domain at a time, such as "Abilities", "Skills", or "Work Activities".
Build a bridge with onet_crosswalk_bridge() when releases use different O*NET-SOC vintages.
Reconcile with onet_panel_reconcile().
Filter, weight, or model only after checking change_type, method_break, crosswalk_uncertain, and safely_comparable.