[ publication]

Work performed as a Data Science Intern at Lawrence Livermore National Lab.
Paper published in Water Resources Research (WRR) at the American Geophysical Union (AGU).
Abstract:

Groundwater ages provides insight into recharge rates, flow velocities, and vulnerability to contaminants. The ability to predict groundwater ages based on more accessible parameters via Machine Learning (ML) would advance our ability to guide sustainable management of groundwater resources. In this study, ML models were trained and tested on a large dataset of tritium concentrations (n=2410) and tritium-helium groundwater ages (n=1157) from the California Central Valley, a large groundwater basin with complex land use, irrigation, and water management practices. The ML models were trained on 63 features, including location, well construction information, landscape characteristics, and climate variables, water chemistry, and stable isotopes. The Bagging Regressor ML method can accurately classify (F1-score = 0.91) groundwater samples as either modern or pre-modern whereas the accuracy of the ML prediction of continuous tritium-helium groundwater ages is limited and explains only ~30% of the variability in this dataset. In general, ML groundwater age prediction relies mostly on features related to (1) the source of groundwater recharge, (2) contaminant history, (3) aquifer materials, (4) well construction, and (5) geochemical reactions along flow paths.