Methodology

How 1,802 Indian occupations are scored against the H₂ economy on six adjacency dimensions, with explicit caveats on sample size and reproducibility.

Last Updated: May 2026 · Dataset v1.4.3.0 · PLFS 2023-24

Jump to section

The six H₂ adjacency dimensions

Each occupation in the National Classification of Occupations (NCO) is scored 0–10 on six dimensions. The mean is the occupation's H₂ adjacency score, which drives the atlas treemap colour bands and the focus view threshold (≥5).

H₂ Adjacency: How directly the occupation's core tasks intersect with hydrogen production, transport, storage, or end-use. A process operator at an electrolyser plant scores high; a generalist accountant scores low even within an H₂ company. Source: anchor-language match against IEA Hydrogen Patents task descriptors and IRENA workforce profiles.
Transition Demand: Projected net new positions in this occupation under India's H₂ Mission scenarios (2030, 2035, 2047). Driven by the scenario engine, not by historical employment. Source: occupation × scenario coefficients in scenarios.json; see scoring code in score/score.py.
Skill Transferability: Overlap between the occupation's NCO skill descriptors and the canonical H₂-job skill set. High values mean a worker can move into H₂ roles with modest retraining; low values mean substantial reskilling. Source: cosine similarity over NCO skill anchor terms.
Digital / Automation Exposure: Probability that the occupation's task bundle is materially restructured by digital automation over the 2026–2035 window. Higher values increase urgency of reskilling rather than headcount expansion. Source: Frey-Osborne style task-automation probability adapted to the Indian NCO.
Formalization Rate: Scored caveat indicator for likely contract and social-security coverage. It is not used to activate the gap KPI in this WHS build; the gap KPI uses PLFS subdivision supply coverage instead. Source: scored occupation model; not PLFS unit-level microdata.
Scarcity Risk: Composite signal combining current vacancy rates, training-pipeline thinness, and geographic concentration. Higher values flag occupations where demand expansion is most likely to outrun supply. Source: NCS vacancy data + AICTE training enrolment + state-level concentration HHI.

How scores are produced

Per-dimension scores are produced by Claude (Anthropic) against the prompts checked into prompts/. Each prompt instructs the model to cite specific anchor language for the dimension. The founder reviews scores before merge and pins them at the commit hash they were generated against. Scores are not regenerated unless the prompt changes or new occupations are added; the policy is to pin, not refresh.

NCO extension policy

The base NCO 2015 covers 1,802 occupations. For occupations that exist in the H₂ economy but lack a standard NCO code (for example, electrolyser-plant operator), the atlas adds H₂-frontier codes prefixed by subsector: H2-MAR-001 for maritime, H2-RFNBO-001 for renewable-fuels-of-non-biological-origin. Each frontier occupation cites a per-occupation anchor source (IMO, IRENA, EU RED III, etc.). Frontier codes are clearly flagged in the data and only receive PLFS supply when a defensible NCO subdivision anchor exists.

Caveats and limitations

PLFS 2023-24 Annual Report Table 25 reports worker distribution by NCO-2015 2-digit subdivision. The atlas allocates each subdivision total across checked-in occupations using the same H₂ adjacency plus transition-demand weighting used by the scenario engine. Treat the number as a subdivision-allocated estimate; indicative, not occupation-observed.

Score reproducibility depends on prompt and model version. The model and prompt hash for each scoring pass are recorded in the repository commit history. A future scoring pass that produces different numbers should be treated as a separate dataset version, not a correction of the prior one.

The atlas does not cover informal-sector work that falls outside both the NCO and the H₂-frontier extension list. This is a known coverage gap.

Sample-size threshold

Supply estimates are not derived from unit-level PLFS microdata in this release. The checked-in model/plfs_supply.json uses the Table 25 rural+urban person share for each NCO subdivision and multiplies it by an indicative 2024 total-worker denominator. Gap mode is enabled only when H₂-ready occupations in the current view have subdivision supply coverage. Users running policy decisions off these numbers should treat them as directional and consult the PLFS unit files when occupation-observed estimates are required.

Reproducibility

All data, scoring code, prompts, and the build pipeline are in the public repository: github.com/e740554/india-h2-jobs. A clean checkout plus python build/build.py reproduces the site exactly. The dataset version and PLFS round are stamped on every page in the freshness badge. If a number on the atlas appears wrong, file an issue with the occupation code and the screenshot; we triage against the pinned scoring commit.