Project Team: Alan Kolok, Lucas Sheneman, Chantal Vella
The long-term goal of this program is to model the associations that occur among metal contamination (as a consequence of mining), watershed geography and adverse human health impacts across the Rocky Mountains. In this pilot project, we will focus on generating a predictive classifier model that includes data from Oregon, Washington, Idaho and Western Montana. Our central hypothesis is that geospatial models that incorporate the occurrence of metal contamination in large watersheds can be predictive of adverse health outcomes, including birth defects, pediatric cancers and cardiovascular disease. To satisfy this hypothesis, the following aims will be addressed.
Aim 1: derive a collection of interoperable digital map layers of the northwestern United States that effectively integrate adverse health outcomes and hydrologic units (watersheds).
Aim 2: use supervised machine learning methods using derived Aim 1 data layers to build and train a spatially-explicit classifier model that discretely categorizes mountain west hydrologic regions in terms of estimated relative health risk by effectively correlating related adverse health outcomes with identified hydrologic units.
A comprehensive evaluation of data available from Public Health Departments in Idaho, Oregon, Washington and Montana will be accomplished. We will also acquire data on premature mortality from the National Vital Statistics System via the publicly available CDC WONDER database. Data on the prevalence of pediatric cancer and birth defects will be gathered from state registries, where available. Spatially nested hydrologic unit (watershed) maps at varying scales (regions, sub-regions, accounting units, cataloging units, etc.) will be harvested from the publicly-available USGS Watershed Boundary Dataset (WBD). All combined source watershed and public health data will be centrally stored, catalogued, transformed, and managed in collaboration with the Northwest Knowledge Network (NKN) at UI.
A discrete classifier system in Esri ArcGIS and/or R will be produced using the spatially-transformed health data from Specific Aim 1. Uniquely identified hydrologic units at multiple scales (using USGS HUC naming conventions) will be assigned by the trained classifier that will be produced for this project. A gradient relative health-risk label ranging from low to high will be developed. The end result of applying an effective trained classifier across the full input dataset will be an efficiently derived geospatial data layer that estimates and discretely labels overall relative human health risk within identified watershed boundaries.