How to Fool Yourself with O*NET over Time • onet2r

A common mistake is to treat every release-to-release difference as a fresh within-occupation change. O*NET archives contain source dates, domain sources, suppression flags, and taxonomy vintages. Ignoring those fields can turn a carryforward, transition row, or crosswalk seam into a misleading trend.

Do not start with the coefficient. Start with the audit fields. If a result depends on rows that were carried forward, bridged across a split, or marked as transition data, the caveat belongs next to the estimate.

The Naive Difference

panel <- onet_panel(
  "Abilities",
  versions = c("30.2", "30.3"),
  scale = "IM",
  archives = c(
    `30.2` = file.path(archive_base, "db_30_2_text"),
    `30.3` = file.path(archive_base, "db_30_3_text")
  ),
  release_dates = c(`30.2` = "2026-02-01", `30.3` = "2026-05-01")
)

naive <- panel |>
  select(release_version, onet_soc_code, element_id, element_name, data_value)

naive <- naive |>
  filter(release_version == "30.2") |>
  select(onet_soc_code, element_id, element_name, from_value = data_value) |>
  inner_join(
    naive |>
      filter(release_version == "30.3") |>
      select(onet_soc_code, element_id, to_value = data_value),
    by = join_by(onet_soc_code, element_id),
    relationship = "one-to-one"
  ) |>
  mutate(naive_change = to_value - from_value) |>
  arrange(desc(abs(naive_change)))

naive |>
  head(8) |>
  onet_kable()

onet_soc_code	element_id	element_name	from_value	to_value	naive_change
29-1141.00	1.A.1.b.1	Problem Sensitivity	4.60	4.90	0.30
15-1252.00	1.A.1.a.1	Oral Comprehension	4.12	4.35	0.23
41-1011.00	1.A.1.a.1	Oral Comprehension	4.00	4.15	0.15
11-1011.00	1.A.1.a.1	Oral Comprehension	4.38	4.50	0.12
15-1252.00	1.A.1.b.1	Problem Sensitivity	4.50	4.50	0.00
29-1141.00	1.A.1.a.1	Oral Comprehension	4.71	4.71	0.00
11-1011.00	1.A.1.b.1	Problem Sensitivity	4.22	4.22	0.00

The naive table is useful as a screening tool. It is not enough for an interpretation.

The Reconciled Difference

changes <- onet_panel_reconcile(panel, onet_crosswalk_bridge("2019", "2019"))

changes |>
  select(
    to_soc_code,
    element_name,
    value_change,
    change_type,
    from_source_date,
    to_source_date,
    method_break,
    safely_comparable
  ) |>
  arrange(desc(abs(value_change))) |>
  head(8) |>
  onet_kable()

to_soc_code	element_name	value_change	change_type	from_source_date	to_source_date	method_break	safely_comparable
29-1141	Problem Sensitivity	0.30	recode_or_recalc_flag	2024-08-01	2024-08-01	FALSE	FALSE
15-1252	Oral Comprehension	0.23	real_update	2024-07-01	2025-07-01	FALSE	TRUE
41-1011	Oral Comprehension	0.15	real_update	2024-06-01	2025-06-01	FALSE	TRUE
11-1011	Oral Comprehension	0.12	real_update	2024-07-01	2025-07-01	TRUE	FALSE
15-1252	Problem Sensitivity	0.00	stale_carryforward	2024-07-01	2024-07-01	FALSE	TRUE
29-1141	Oral Comprehension	0.00	resampled_stable	2024-08-01	2025-08-01	FALSE	TRUE
11-1011	Problem Sensitivity	0.00	stale_carryforward	2024-07-01	2024-07-01	FALSE	TRUE

Now the same differences have labels. A change without a source-date change is not as strong as a change with fresh source data. A method break should be treated as a warning rather than as clean within-occupation change.

A useful rule of thumb. Use value_change to find rows worth inspecting, then use change_type, method_break, and safely_comparable to decide what those rows can support.

The Taxonomy Seam

cross_panel <- onet_panel(
  "Abilities",
  versions = c("24.3", "25.1"),
  scale = "IM",
  archives = c(
    `24.3` = file.path(archive_base, "db_24_3_text"),
    `25.1` = file.path(archive_base, "db_25_1_text")
  ),
  release_dates = c(`24.3` = "2020-08-01", `25.1` = "2020-11-01")
)

bridge <- tibble::tibble(
  from_vintage = "2010",
  to_vintage = "2019",
  from_onet_soc_code = c("15-1132.00", "15-1132.00", "29-1141.00"),
  to_onet_soc_code = c("15-1252.00", "15-1253.00", "29-1141.00"),
  map_type = c("split", "split", "one_to_one"),
  crosswalk_weight = c(0.5, 0.5, 1)
)

seam <- onet_panel_reconcile(cross_panel, bridge)

seam |>
  select(
    from_onet_soc_code,
    to_onet_soc_code,
    element_name,
    change_type,
    transition_data,
    crosswalk_uncertain,
    safely_comparable
  ) |>
  onet_kable()

from_onet_soc_code	to_onet_soc_code	element_name	change_type	transition_data	crosswalk_uncertain	safely_comparable
15-1132.00	15-1252.00	Oral Comprehension	transition_data	TRUE	TRUE	FALSE
15-1132.00	15-1253.00	Oral Comprehension	transition_data	TRUE	TRUE	FALSE
29-1141.00	29-1141.00	Oral Comprehension	real_update	FALSE	FALSE	TRUE
15-1132.00	15-1252.00	Problem Sensitivity	dropped	FALSE	TRUE	FALSE
15-1132.00	15-1253.00	Problem Sensitivity	dropped	FALSE	TRUE	FALSE

safe_counts <- seam |>
  mutate(comparability = if_else(safely_comparable, "Safe", "Not safe")) |>
  count(comparability, name = "rows")

ggplot2::ggplot(safe_counts, ggplot2::aes(
  x = comparability,
  y = rows,
  fill = comparability
)) +
  ggplot2::geom_col(width = 0.6, show.legend = FALSE) +
  ggplot2::coord_flip() +
  ggplot2::scale_fill_manual(
    values = c("Safe" = onet2r_colors[["teal"]], "Not safe" = onet2r_colors[["rose"]])
  ) +
  ggplot2::labs(
    title = "Taxonomy Seams Reduce Safe Comparisons",
    subtitle = "A split occupation carries extra uncertainty even when codes can be bridged.",
    x = NULL,
    y = "Rows"
  ) +
  onet2r_theme()

Horizontal bar chart comparing safe and unsafe comparisons at a taxonomy seam.

Before making a historical claim, ask: Did the value change? Did the source date change? Did the occupation cross a taxonomy seam? Was either row transition data or suppressed? The package gives you those fields so the caveats do not get lost.