
Why Longitudinal O*NET Analysis Is Hard
Source:vignettes/longitudinal-onet-background.Rmd
longitudinal-onet-background.RmdO*NET is one of the most useful public sources for occupation-level skills, tasks, abilities, work activities, and work contexts. It is also easy to misuse as a time series. The most important fact is simple: O*NET was not built as a longitudinal panel. The Web Services API serves the current release, while version-over-version analysis has to be reconstructed from the downloadable release archives (National Center for O*NET Development, n.d.b).
That reconstruction is why onet2r includes archive,
bridge, and reconciliation helpers. They do not make the historical data
automatically causal or comparable. They make the assumptions
visible.
The 3 Problems Users Must Handle
1. The API Is Current-Release First
Most API workflows ask “what does this occupation look like now?” Longitudinal work asks “what changed between release A and release B?” Those are different questions. For panels, use archive releases rather than the current-release API.
panel <- onet_panel(
"Abilities",
versions = c("30.2", "30.3"),
scale = "IM",
archives = c(`30.2` = archive_302, `30.3` = archive_303),
release_dates = c(`30.2` = "2026-02-01", `30.3` = "2026-05-01")
)
panel |>
count(release_version, release_date, soc_vintage, domain) |>
onet_kable()| release_version | release_date | soc_vintage | domain | n |
|---|---|---|---|---|
| 30.2 | 2026-02-01 | 2019 | Abilities | 7 |
| 30.3 | 2026-05-01 | 2019 | Abilities | 7 |
The table above is produced by onet_panel(). In a live
workflow, the same call can download archive ZIP files by version. In a
package vignette, local archive paths keep the output reproducible.
2. Occupation Codes Change across Taxonomies
O*NET-SOC vintages do not line up one-to-one forever. Occupations
split, merge, appear, and disappear. Hosseinioun et al. (2025)
handle this by composing adjacent official O*NET-SOC crosswalks, which
is the same strategy used by onet_crosswalk_bridge().
The synthetic fixture below spans the 2010-to-2019 O*NET-SOC seam. One 2010 software occupation maps to 2 2019 occupations, so the bridge is uncertain for within-occupation comparison.
cross_panel <- onet_panel(
"Abilities",
versions = c("24.3", "25.1"),
scale = "IM",
archives = c(`24.3` = archive_243, `25.1` = archive_251),
release_dates = c(`24.3` = "2020-08-01", `25.1` = "2020-11-01")
)
bridge_2010_2019 <- tibble::tibble(
from_vintage = "2010",
to_vintage = "2019",
from_onet_soc_code = c("15-1132.00", "15-1132.00", "29-1141.00"),
to_onet_soc_code = c("15-1252.00", "15-1253.00", "29-1141.00"),
map_type = c("split", "split", "one_to_one"),
crosswalk_weight = c(0.5, 0.5, 1)
)
cross_changes <- onet_panel_reconcile(cross_panel, bridge_2010_2019)
cross_changes |>
select(
from_onet_soc_code,
to_onet_soc_code,
element_name,
change_type,
crosswalk_uncertain,
transition_data,
safely_comparable
) |>
onet_kable()| from_onet_soc_code | to_onet_soc_code | element_name | change_type | crosswalk_uncertain | transition_data | safely_comparable |
|---|---|---|---|---|---|---|
| 15-1132.00 | 15-1252.00 | Oral Comprehension | transition_data | TRUE | TRUE | FALSE |
| 15-1132.00 | 15-1253.00 | Oral Comprehension | transition_data | TRUE | TRUE | FALSE |
| 29-1141.00 | 29-1141.00 | Oral Comprehension | real_update | FALSE | FALSE | TRUE |
| 15-1132.00 | 15-1252.00 | Problem Sensitivity | dropped | TRUE | FALSE | FALSE |
| 15-1132.00 | 15-1253.00 | Problem Sensitivity | dropped | TRUE | FALSE | FALSE |
3. A Release-to-Release “Change” May Not Be a True Content Change
Many O*NET archive tables include a source date and domain source. These fields matter because a value can appear unchanged simply because the occupation was not re-surveyed, or a value can change without a source-date change because of a recode or recalculation. Handel (2016) is a useful entry point for the broader measurement and comparability cautions.
changes <- onet_panel_reconcile(
panel,
bridge = onet_crosswalk_bridge("2019", "2019")
)
changes |>
distinct(value_changed, date_changed, change_type, safely_comparable) |>
arrange(change_type) |>
onet_kable()| value_changed | date_changed | change_type | safely_comparable |
|---|---|---|---|
| FALSE | FALSE | stale_carryforward | TRUE |
| TRUE | TRUE | real_update | TRUE |
| TRUE | TRUE | real_update | FALSE |
| FALSE | TRUE | resampled_stable | TRUE |
| TRUE | FALSE | recode_or_recalc_flag | FALSE |
comparison_counts <- cross_changes |>
mutate(comparability = if_else(safely_comparable, "Safe", "Not safe")) |>
count(comparability, name = "rows")
ggplot2::ggplot(comparison_counts, ggplot2::aes(
x = comparability,
y = rows,
fill = comparability
)) +
ggplot2::geom_col(width = 0.6, show.legend = FALSE) +
ggplot2::coord_flip() +
ggplot2::scale_fill_manual(
values = c("Safe" = onet2r_colors[["teal"]], "Not safe" = onet2r_colors[["rose"]])
) +
ggplot2::labs(
title = "How Many Rows Are Safe to Compare?",
subtitle = "The count comes from the cross-vintage example, not a hand-built table.",
x = NULL,
y = "Rows"
) +
onet2r_theme()
Those rows are not a hand-built truth table. They are the observed classifications from the example archive panels. The patterns are the minimum checks users should make before interpreting any cross-release difference.
How This Differs from the Standard Task Approach
The canonical task-framework literature often avoids O*NET’s
version-over-version variation. Instead, it holds task scores fixed and
gets time variation from changing occupational employment shares (Autor, Levy, and Murnane
2003; Autor and Dorn 2013; Autor 2013). That approach is
standard because it is cleaner for many labor-market questions.
onet2r enables a complementary route: directly comparing
archived O*NET content after marking taxonomy and survey-timing
problems.
Direct within-occupation change work is closer to what the longitudinal module does. Consoli et al. (2023) combine DOT and O*NET to study within-occupation task change over a long period. Chen et al. (2026) report a narrower O*NET version comparison as one stylized fact inside a paper whose main data source is online vacancies.
Official Resources to Check
- The O*NET Database Releases Archive is the source for release ZIP files (National Center for O*NET Development, n.d.b).
- The O*NET Data Dictionary documents table structure, date fields, source fields, and historical content changes (National Center for O*NET Development, n.d.a).
- The O*NET-SOC taxonomy files provide the official adjacent crosswalks used by the bridge layer (National Center for O*NET Development, n.d.c).
- BLS provides a methodological overview for mapping Employment Projections and O*NET data, including split, merge, and imputation issues that motivate future employment-weighted bridge options (U.S. Bureau of Labor Statistics 2021).
Adjacent Tools to Verify before Citing
Several tools map O*NET to other classification systems rather than
across O*NET’s own versions. They may be useful, but they are
intentionally not in inst/REFERENCES.bib until verified to
citation standard:
- Institute for Structural Research occupational task crosswalk files.
-
occupationcross, reported as an R package for occupational reclassification. - The Dais NOC-to-O*NET crosswalk for Canada.
- European Commission ESCO-O*NET crosswalks.