Environmental Data Science Lunch - Autumn 2020

Oct 22

James Evans, Knowledge Lab, UChicago

Ziwei Wang, Geophysical Sciences, UChicago
October 29, 2020

Severe convective storms are among the most socio-economically devastating events for the populated mid-latitudes. One of the most common metrics for those events is called Convective available potential energy (CAPE), defined as the integrated buoyancy of near-surface air. CAPE is assumed to increase in future, warmer conditions, and quantifying that increase is an important question in atmospheric science. In this talk, I will start with a showcase of the occurrence and accumulated damage of convective events under the current climate. I will use high-resolution model output to show how much and why CAPE changes as climate warms, and illustrate how could this change be understood with a climatological shift of surface and upper-tropospheric thermodynamics conditions.

Karina Acosta Ordonez, Center for Spatial Data Science, UChicago
November 5, 2020

Many public policies in underdeveloped and developing countries are “spatially blind”. Among the drivers of this blindness is the lack of up-to-date geographical breakdown of data which limits the ability of policymakers to develop and assess impacts of policies to tackle poverty at a granulated subnational scale. In this context, this study aims to contribute to the estimation and mapping at a subnational scale of multidimensional poverty in Cambodia for the years 2000, 2005, 2010 and 2014. The methodological framework to overcome data limitations encompasses geostatistical models which allow us to combine information provided by surveys and spatial gridded data to estimate poverty indicators that are not formerly targeted by national surveys.

Julie Bessac, Assistant Computational Statistician, Argonne National Laboratory 
November 19, 2020

In the first part of the talk, we present a statistical model for the sub-grid scale variability of air-sea fluxes driven by wind speed. In physics-based models, sub-grid scale variability arises from fine-scale processes that are not resolved at the given model resolution. Quantifying the influence of these sub-grid scales on the resolved scales is needed to better represent the entire system. In this work, we model the difference between the true fluxes and those calculated using area-averaged wind speeds. This discrepancy is modelled in space and time, conditioned on the resolved fields, via a locally stationary space-time Gaussian process with the view of developing a stochastic wind-flux parameterization. Additionally, the Gaussian process is proposed in a scale-aware fashion meaning that both mean and space-time correlation depend on the considered model resolution. The scale-aware capability enables us to derive a stochastic parameterization of sub-grid variability at any resolution and to characterize statistically the space-time structure of the discrepancy process across scales.

The second part of the talk focuses on the construction of a stochastic weather generator for bulk and tails of temperature to be used in long-term planning in power-grid systems. Stochastic weather generators are commonly used to overcome the lack of observational data or the computational burden of numerical weather models, they enable us to simulate realistic features of weather variables in an inexpensive data-driven approach. Temperature and its extremes (cold and hot) impact the energy generation and demand, as well as infrastructures. Mathematical models are typically designed and optimized to study long-term planning of power-grid systems, in particular to improve their economic efficiency and resilience to high-impact and extremes events. Since energy demand depends on the hour and day of the year and on regional aspects, we propose to generate temperature scenarios at an hourly level over the Midwest, which serve as inputs of power-grid mathematical models. Since high-impact events in power grid are not restricted to extreme temperature, we base our model on a newly proposed probability distribution bridging the bulk and both tails of a distribution into a single comprehensive model.

Marynia KolakHaowen Shang, and Lorenz Menendez
November 19, 2020

Measurements of Aerosol Optical Depth (AOD) from NASA’s Terra satellite provide a remote sensing solution to pollution monitoring worldwide. Specifically, AOD allows researchers to proxy PM2.5 concentrations after controlling for a variety of spatial and temporal predictors. The MAIAC algorithm uses advanced image processing techniques to improve the spatial resolution of the Terra AOD product from 10km to 1km, allowing for high resolution PM2.5 modeling. For large metropolitan areas, incorporating a hybrid-sensor approach may provide an alternative for pollution surveillance to using sparse ground sensors alone.

We implement a three-stage neural network model to predict monthly PM2.5 concentrations in the Chicagoland area (across 21 counties) between 2014 and 2018. Spatial predictors include population density, developed land cover, and elevation. Temporal predictors in the model consisted of a variety of meteorological variables (temperature, pressure, wind velocity, wind direction, and visibility), as well as NDVI and a point source emissions inventory. Dummy variables for each month were also included to account for seasonal variation in temporal data. All data wrangling and model output results were opened using R in a public repository.

The model achieved an out-of-sample R-squared of 0.59, and was able to locate high pollution areas like major airports, expressways, and some industrial areas. Pollution levels decreased from 2014 to 2017 before increasing in 2018, with peak PM2.5 levels in late summer. While performance was not as high as models covering larger regions, it generated more detail than publicly available data resources. By open sourcing the data preparation and results, the model can continue to improve and be renewed over time. High spatial resolution data has the potential to provide regional policymakers and communities with better insights on the relationship between PM2.5 pollution and health outcomes.

Paul Roebber, Atmospheric Science, University of Wisconsin-Milwaukee
Dec 3, 2020

Interest and work in data science has been rapidly growing across many disciplines for the past few years. Atmospheric science is no exception, but initial interest was perhaps slower than in some fields owing to the substantial success of simpler techniques such as multiple linear regression. Nonetheless, atmospheric science as a field has some attributes which make it attractive for data science applications. First, many forecast problems have significant nonlinearities and these can be more readily addressed using a variety of data science methods (e.g., artificial neural networks). Second, there are large, longitudinal datasets which can help alleviate some of the problems associated with training nonlinear systems, which require substantial training and validation data to both learn the problem and to avoid overfitting. Third, computational models can be accelerated with neural network or other modules to replace frequently accessed sub-programs in large systems (such as a radiation scheme in a weather prediction or climate model). In this talk, I will highlight some approaches I have used in my research, as well as some techniques for forecast verification, and contrast these with other, more standard techniques. No prior experience in machine learning or other such methods is expected or required.

Zoheyr Doctor, Institute for Fundamental Science, University of Oregon
Dec 10, 2020

In the last five years, we have detected tens of signals of colliding pairs black holes using gravitational-wave interferometry. In this talk, I will describe some of the basic astrophysics of black hole mergers, and then give a tour of some of the data science and computational techniques used to understand these exotic