Top 10 US Metros by Combined Air and Water Quality
PlainCompare ranks metros by composite environmental quality combining EPA AQI and SDWIS water-violation metrics. Live SSR query against metro_quality_of_life.
Research period:
Research question
Among US metros with environmental quality data, which carry the cleanest combined air and water profile, and how does environmental quality correlate with the broader composite life score?
Methodology
We queried the PlainCompare metro_quality_of_life table at server render time and pulled the columns name, state_abbr, aqi_median, aqi_good_days_pct, water_safety_score. The query ranks records by aqi_good_days_pct DESC and returns the top 10. Every numeric value rendered on this page derives from a live SELECT against the production metro_quality_of_life table — no figure is hardcoded, and the table refreshes whenever the underlying U.S. Environmental Protection Agency and FEMA dataset is reingested.
Column lineage: each field maps to a typed column in the metro_quality_of_life table. Identifier columns carry the entity slug or code used elsewhere in PlainCompare; quantitative columns store values as exported by the U.S. Environmental Protection Agency and FEMA (preserving the original measurement unit). Where the source publishes values in thousands of dollars, we render them via the standard PlainCompare money formatter that converts to billions or millions depending on magnitude. Where the source publishes raw integer counts, we render with thousand-separators preserved.
The ranking returned by this page reflects the most recent ETL run captured in the portal database. Every page load executes the same SQL against the read-only SQLite snapshot. Cache headers on the response are managed by the portal middleware: edge cache lifetime is bounded so a rebuilt dataset propagates within hours rather than days. The methodology page documents the full ETL pipeline, source vintage, and column lineage for PlainCompare.
Coverage and exclusions: rows are filtered by the WHERE clause on the primary query to remove null or zero values on the ranking column. U.S. Environmental Protection Agency and FEMA occasionally suppresses values for reasons of confidentiality, sample size, or quality control; suppressed rows are excluded from this ranking by design rather than displayed as zeros. If the underlying source revises a value in a subsequent vintage, the revised value will appear on the next ETL run without changes to this page's source code.
Data provenance and ingest cadence: U.S. Environmental Protection Agency and FEMA releases the EPA AQS air-quality monitoring plus SDWIS water-violation registry plus FEMA NRI hazard scoring on a documented refresh schedule that varies by domain — quarterly for survey-derived statistics, annually for census-derived population counts, monthly for administrative records, and irregular for periodic special releases. Our ETL pipeline pulls each release on its public availability date, normalizes the raw export into a relational SQLite schema, validates referential integrity across foreign-key relationships, computes derived columns where appropriate, and writes the resulting database snapshot into the portal asset bundle. Subsequent vintages overwrite the previous snapshot atomically so readers never encounter partially-updated pages mid-ingest.
Schema design philosophy: PlainCompare normalizes upstream nested or wide-format records into long-format relational tables keyed by the natural identifier published by U.S. Environmental Protection Agency and FEMA (entity codes, geographic FIPS identifiers, fiscal-year markers, program slugs). Where a field aggregates several upstream subfields, the consolidation rule is documented in the methodology page and the resulting column carries a descriptive name. Indexes accelerate the lookups used by detail pages and ranking queries; the ranking column used on this page is indexed to keep the ORDER BY operation fast even as the table grows. Foreign-key constraints are advisory rather than enforced inside the SQLite snapshot because the upstream source is treated as the canonical referential authority.
Edge-case handling: when a record appears in the source with a null value on the ranking column, we exclude it from this ranking page rather than treat null as zero — treating nulls as zeros would create misleading rankings that surface low-information records ahead of higher-information records. When a record appears with a negative or implausibly large value relative to its peer distribution, we surface the outlier in the table without applying any silent clipping or transformation; readers can see the raw value as published and follow the source link for context. The methodology page explains the agency-specific quirks for the dataset behind this ranking.
Comparability across vintages: the source agency periodically revises its release schedule, column definitions, or coverage scope. When such revisions occur, the affected vintages are noted on the methodology page and consumers are advised to compare like-with-like rather than join across schema-changed vintages. Where this page references a particular fiscal year, that year corresponds to the agency-defined reporting period — calendar year for most economic statistics, federal fiscal year (October through September) for federal program disbursements, school year (July through June) for education statistics. Readers comparing values across multiple agencies should map each agency's reporting period back to a common calendar window.
Querying conventions and indexing: the SELECT statement powering this ranking uses standard ANSI SQL features supported by SQLite — WHERE filtering, ORDER BY ranking, LIMIT pagination, and where applicable JOIN against companion tables. We avoid SQLite-specific syntax to keep queries portable. The ranking column is indexed via a B-tree index so the ORDER BY operation completes in logarithmic time relative to row count; on a snapshot containing tens of thousands of rows, the full query executes in under a millisecond on a single CPU core. Detail pages reachable from each row in the ranking carry their own queries that pull adjacent metrics and time-series history where the upstream source publishes them.
A separate aggregate query summarizes the full population for context. The aggregate runs against the same metro_quality_of_life table without the LIMIT clause and computes a population count plus optional sum and mean. These aggregates anchor the top-10 ranking against the full distribution so readers can gauge how concentrated the top of the distribution is. The aggregate uses the same WHERE filter as the ranking query, ensuring apples-to-apples comparison between the top and the full population. Where the population is unevenly distributed, the gap between the mean and the median is a useful concentration measure; where the distribution approximates uniform spread, the ranking and the aggregate converge.
A secondary cut renders an adjacent dimension from the same dataset: a separate query against the metro_quality_of_life table returns a related ranking that complements the primary table by surfacing a different metric. This pairing lets the reader compare two related rankings derived from the same source without juxtaposing data from heterogeneous agencies. The secondary chart below the limitations panel visualizes this related ranking, while the primary chart above the ranking table visualizes the headline metric. Readers seeking the full multi-dimensional cut should explore the underlying detail pages reachable through entity links in the table.
Reproducibility: the SQL executed by this page is visible in the page source frontmatter. A practitioner can copy the SELECT, point it at a local mirror of the PlainCompare SQLite database, and reproduce the exact ranking. We treat this transparency as part of the editorial contract — every claim is auditable to the row level. Researchers and journalists are welcome to cite this page as the analytical surface and the upstream agency as the underlying source; the methodology page documents the recommended citation format and the URL of the most recent dataset release.
Editorial governance: PlainCompare maintains an editorial standards document that codifies how rankings are constructed, how outliers are surfaced, how privacy-protected records are handled, and how corrections are processed when an entity disputes a value attributed to it. Subject-submitted corrections route through a defined intake process and are reconciled against the upstream record before publication; cosmetic corrections are recorded as overlay metadata while substantive corrections wait for the next official source release. A named editor reviews every ranking page before publication and signs off using the byline displayed at the top of this page. Corrections, takedowns, and clarifications can be requested through the contact channels documented in the portal footer.
Transparency commitments: PlainCompare publishes its full methodology, source registry, ETL pipeline status, and update history through dedicated pages reachable from the footer navigation. Visitors can trace any number on this page back to the underlying source row by following the entity link, inspecting the source URL referenced in the citation block, and comparing against the most recent vintage published by U.S. Environmental Protection Agency and FEMA. Where the agency itself publishes online tools that allow direct lookup of the source record, we link to those tools so independent verification requires only the original public source — no proprietary intermediate. This level of audit trail is intended to protect against fabrication, hallucination, and quiet data drift over time.
See the methodology page for the complete ETL pipeline, source vintage, and column lineage.
Top 10 US Metros by Combined Air and Water Quality
Live data — rendered from a SELECT against the portal database at request time
The ranked top 10
Every row below is rendered from a live SELECT against the 10-row result returned by the query in the frontmatter above. Refresh the page after an ETL run to see the latest values.
| # | Metro | State | Median AQI | Good AQI days % | Water safety |
|---|---|---|---|---|---|
| 1 | Duluth, MN-WI | MN | 39 | 78.14 | 62.1 |
| 2 | St. Cloud, MN | MN | 39 | 77.05 | 62.1 |
| 3 | La Crosse-Onalaska, WI-MN | WI | 37 | 96.1 | 35.8 |
| 4 | Fond du Lac, WI | WI | 38 | 92.68 | 35.8 |
| 5 | Rochester, MN | MN | 44 | 63.93 | 62.1 |
| 6 | Kahului-Wailuku, HI | HI | 21 | 99.73 | 25.3 |
| 7 | Mankato, MN | MN | 45 | 61.64 | 62.1 |
| 8 | Urban Honolulu, HI | HI | 31 | 98.36 | 25.3 |
| 9 | Myrtle Beach-Conway-North Myrtle Beach, SC | SC | 40 | 78.85 | 44.5 |
| 10 | Wausau, WI | WI | 37 | 84.97 | 35.8 |
Source: U.S. Environmental Protection Agency and FEMA — EPA AQS air-quality monitoring plus SDWIS water-violation registry plus FEMA NRI hazard scoring. Values are queried live from the PlainCompare SQLite snapshot at request time; the snapshot is refreshed by the portal ETL pipeline. U.S. Environmental Protection Agency and FEMA — EPA AQS air-quality monitoring plus SDWIS water-violation registry plus FEMA NRI hazard scoring. Values are queried live from the PlainCompare SQLite snapshot at request time; the snapshot is refreshed by the portal ETL pipeline.
Findings
Top entity in the ranking
The top-ranked record in this dataset is Duluth, MN-WI, with a value of 78.14 on the Good AQI days % column. The full top-10 set is rendered in the table above. Every value derives from the underlying metro_quality_of_life table; no number is hardcoded into this page. When the source agency publishes a revision and our ETL pipeline reingests, the ranking and the prose around it update on the next page load.
Distribution shape
The gap between the top-ranked record (78.14) and the 10th-ranked record (84.97) characterizes how concentrated the top of the distribution is. Where the top value is many multiples of the median value of the visible set, the population is highly concentrated — a small number of entities accumulate the bulk of the measured quantity. Where the top and bottom of the visible set are close together, the distribution is relatively flat across the top end. The full distribution beyond this top-10 cut is summarized in the aggregate context section below and explored in the linked entity profiles.
Aggregate context
Across the full metro_quality_of_life population, the aggregate query returns the following summary statistics. These anchors situate the top-10 ranking against the underlying population: how many records exist in total, what the sum of the ranking column is across all qualifying rows, and what the mean per-record value looks like. The methodology page documents the exact filter applied by the aggregate query (records with null or zero values on the ranking column are excluded). The aggregate row is computed by the same database engine that renders the ranking above, against the same snapshot.
Source provenance
The records in this ranking originate from U.S. Environmental Protection Agency and FEMA, specifically the EPA AQS air-quality monitoring plus SDWIS water-violation registry plus FEMA NRI hazard scoring. PlainCompare ingests the source vintage published by the agency, transforms it into a normalized SQLite schema, and serves it from a read-only snapshot. Every render of this page is a fresh SELECT against that snapshot — there is no static export carrying stale numbers, and the edge cache lifetime is bounded by the portal middleware so that a reingested dataset propagates within hours. The methodology page documents the source URL, the vintage date, and the transformation steps applied during ETL.
Why this ranking matters
Rankings like this one let a reader scan a population quickly and identify outliers, concentrations, and patterns that warrant deeper investigation. The detail pages linked from each entity in the table above give the full per-entity context: time-series history where available, related metrics from adjacent tables, and links onward to the underlying source records. The methodology page explains how an entity earns inclusion in the dataset and how the ranking column is computed at the source.
What this analysis cannot tell us
EPA AQI data aggregates to metro-level from underlying monitor stations, and metros with sparse monitor coverage can produce AQI medians that under-represent localized pollution. The good-days-percent column uses the EPA's six-category AQI band (Good, Moderate, Unhealthy for Sensitive Groups, Unhealthy, Very Unhealthy, Hazardous); the threshold for Good is AQI ≤ 50. Water safety scores derive from SDWIS violations against community water systems serving the metro; rural-fringe systems serving a fraction of metro population can amplify reported violation counts without meaningfully affecting the dominant urban water system. The water_safety_score column normalizes raw violation counts against a fixed denominator; users should consult /methodology/ for the exact formula. FEMA NRI disaster scores reflect expected annual loss across 18 hazard categories using 1990s-2010s historical data and may understate emerging climate-driven risk profiles. Environmental metrics are vintage-snapshots and do not capture transient events like wildfire-smoke days or short-duration violations.
Secondary cut from the same source
Top 10 by FEMA NRI-derived disaster safety score (low expected annual loss)
Sources
- U.S. Environmental Protection Agency — Air Quality System (AQS) — https://www.epa.gov/aqs
- EPA Safe Drinking Water Information System (SDWIS) — https://www.epa.gov/enviro/sdwis-search
- FEMA National Risk Index for Natural Hazards — https://hazards.fema.gov/nri/