methodology
How the data works — sources, processing, and limits.
Every number on this site is ingested from official U.S. government bulk files. No third-party aggregator is scraped or proxied. Here is exactly where the data comes from, how it’s processed, and where it falls short — so you can judge it yourself.
Sources
- DOL OFLC prevailing wage (FLAG) — the Level I–IV prevailing wage per occupation (SOC) and area.
- DOL OFLC disclosure files — LCA (ETA-9035), PERM (ETA-9089), PWD (ETA-9141), CW-1, H-2A, H-2B: every filing’s employer, wage, level, SOC, and worksite.
- USCIS H-1B Employer Data Hub — petition approvals/denials per employer per fiscal year (the denial-rate signal).
- BLS OEWS — the survey the prevailing-wage levels derive from.
- USGS GNIS — town → county mapping so a town search resolves to a wage area.
Current coverage and ingest dates are on the data status page.
How it’s processed
- Ingestion: each government file is parsed by column name (not position) and loaded into a local database; a run fails loudly if a file’s layout changes, so bad data isn’t silently imported.
- Employer identity: filings are matched to an employer by IRS FEIN when the source provides one (PERM/PWD), falling back to a normalized company name otherwise. This consolidates an employer’s filings that appear under slightly different name strings.
- De-duplication: filings are de-duped on their case number, so overlapping quarterly files don’t double-count.
- Figures: the prevailing-wage check compares your annualized offer to the Level I–IV floor; the money-trace models vendor-layer cuts; the confidence score is a transparent weighted blend of the factors it shows.
Honest limitations
- Lag: government files trail reality by one to three months (sometimes more). Always verify against the linked source before acting.
- Name-based attribution: LCA and USCIS files carry no FEIN, so those are matched by company name — a large employer filing under several legal names can fragment, and two different companies sharing a normalized name can merge. We err toward showing the record neutrally, not guessing.
- Partial datasets: some programs are ingested for only certain periods (e.g., PERM is a recent slice); USCIS bulk data lags by a year or more. A “—” means “not in the ingested data,” not necessarily zero.
- No OPT dataset: there is no public per-employer OPT/STEM-OPT dataset; we never fabricate one and lean on the H-1B record as the forward-looking signal.
- Wage levels: levels ≈ 17/34/50/67th percentiles under current DOL methodology; a pending proposal would raise them.
Corrections
If a figure looks wrong, it’s almost always reproducible from the source — check flag.dol.gov or the USCIS hub, and email [email protected] with the page.