Environmental Data Science Lunch - Winter & Spring 2020

Parmanand Sinha, Computational Scientist, Research Computing Center, Office of Research and National Laboratories, UChicago
January 16, 2020

Gridded human population data provide a spatial denominator to identify populations at risk, quantify burdens, and inform our understanding of human-environment systems. When modeling gridded population, the information used for training the model may differ in spatial resolution than what is produced by the model prediction. This case arises when approaching population modeling from a top-down, dasymetric approach in which one redistributes coarse administrative unit level population data (i.e., source unit) to a finer scale (i.e., target unit). However, often overlooked are issues associated with the differing variance across the scale, spatial autocorrelation and bias in sampling techniques.

In this study, we examine the effects of intentionally biasing our sampling from the source to target scale within the context of a weighted, dasymetric mapping approach. The weighted component is based on a Random Forest estimator, which is a non-parametric ensemble-based prediction model. We investigate issues of autocorrelation and heterogeneity in the training data using 18 different types of samples to show the variations in training, census-level (i.e., source) and output, grid-level (i.e., target) predictions. We compare results to simple random sampling and geographically stratified random sampling.

Results indicate that the Random Forest model is sensitive to the spatial autocorrelation inherent in the training data, which leads to an increase in the variance of the residuals. Sample training datasets that are at a spatial scale representative of the true population produced the best fitting models. However, the true representative dataset varied in autocorrelation for both scales. More attention is needed with ensemble-based learning and spatially-heterogeneous data as underlying issues of spatial autocorrelation influence results for both the census-level and grid-level estimations.

Link to paper

Carlo Graziani, Computational Scientist, Argonne National Labs
January 23, 2020

Probabilistic forecasts are fundamental tools for making decisions under uncertainty in a wide variety of fields, including weather, energy use, and finance. I will present a scheme whereby a base probabilistic forecasting system that is poorly-calibrated may be recalibrated by incorporating past performance information to produce a new forecasting system that is demonstrably superior to the original one, in that it can consistently win wagers against the original system at rates that are predictable in advance. The recalibration scheme is formulated in a framework that exploits the deep connections between information theory, forecasting, and betting.

January 30

Charlotte Haley, Mathematics/ Statistics, Argonne National Labs

Angela Li & Emily Padston
February 6, 2020

Varada Shevade, Postdoc at the University of Maryland's Department of Geographical Sciences
February 13, 2020

Habitat loss and fragmentation threaten biodiversity globally. Agricultural expansion is a dominant driver of deforestation in the tropics. In particular, Peninsular Malaysia has a long history of deforestation and land cover land use changes. The Malayan tiger conservation plan relies on maintaining linkages between forest fragments. In this talk, I will focus on the land cover and land use change in Peninsular Malaysia in the context of the tiger conservation landscape. I will present the forest loss and conversions to plantations in the past two decades and the impacts on forest connectivity and provide an overview of the datasets and methods used for the analysis. If time permits, I will explore some recent work on mapping population vulnerability to malaria in Southeast Asia.

Diego Rybski, Potsdam Institute for Climate Impact Research Sustainability, development, urbanization - problems and perspectives
February 20, 2020

The sustainable development goals (SDGs) combine classical human development with the quest for sustainability. At the same time, development - e.g. measured by gross domestic product per capita - is correlated with urbanization rates (on the country scale). Relating sustainability and development as well as development and urbanization leads to the question how the third pair, sustainability and urbanization, is related. We identify conceptual hurdles that emerge when applying the concept of sustainability to cities and discuss which insights can be gained from urban scaling in this context. As an exploratory study, urban scaling is employed to investigate indicators of SDG11 "Sustainable Cities and Communities" of various countries. We argue that such an analysis can provide additional insights and for many SDG11 indicators urban scaling represents a measure of inequality. The analysis is complemented by results for urban CO2 emissions, which share similar problems as the more general sustainability.

Jake Roth, Argonne National Labs and University of Chicago Department of Statistics
February 27, 2020

Following its introduction in the 1950s, the conjugate gradient (CG) method was viewed as a direct method for solving linear systems. A decade or so later, extensions were developed for interpreting it as an iterative procedure for solving more general nonlinear systems, including function minimization. CG is a foundational procedure in numerical linear algebra, but given the decade-long gap between the linear and nonlinear interpretations, its derivation may be murky without appropriate context. In this talk, we will review and motivate the method from a geometric perspective and demonstrate an example of its applicability to an environmental problem.

Tamma Carleton, Climate Impact Lab
March 5, 2020

A key unknown parameter for the design of climate policy is the global marginal economic damage caused by emitting a single ton of carbon dioxide (CO2), or its equivalent. Current approaches to estimate this number rely on spatially-coarse theoretical-numerical models that are not tightly linked to data. We develop an architecture that integrates best-available global datasets, econometric analyses, and climate science to estimate local, data-based, probabilistic damages with global coverage alongside simultaneous estimates of aggregated global marginal damages. Importantly, our modular approach is designed to use empirical relationships in each sub-sector of the global economy to account for adaptations to a changing climate and projected economic development. Here, we apply this architecture to construct the first global empirical estimates of the impact of climate change on total non-transport energy consumption, one of the most uncertain impacts in current models, accounting for electricity and all other fuels in residential, commercial, industrial and agricultural end-uses. In 2100, we project global electricity consumption to rise roughly 3.6 EJ for each 1C increase in global mean temperature, reflecting increased cooling demand, while consumption of other fuels declines 9.0 EJ per 1C, reflecting reduced heating. Together, these estimates indicate that emission of 1 ton of CO2 today produces a global net savings in future aggregate energy consumption of about $1 in net present value (3% discount rate). This result contrasts strongly with estimates derived from alternative approaches; for example, the numerical-theoretical FUND IAM estimates increases in energy consumption amounting to $8 per ton of CO2 (high emissions scenario, 3% discount rate). The costs and savings we project under climate change are unequally distributed, with many emerging economies increasing electricity consumption dramatically, while wealthy economies benefit from heating reductions. Impacts in hot poor countries are generally constrained by low overall energy usage, even with projected future economic development.

Daniel Arribas-Bel
April 2, 2020

This workshop will introduce the nascent field of Geographic(/Spatial) Data Science through the vision, structure, and features of PySAL. PySAL is a Python (federation of) package(s) for geocomputation and spatial/geographic data science. Against the backdrop of Open Science and reproducibility, we will first briefly review the library’s history and how it connects with that of data science. This will highlight the original goals of providing a shared and open platform to develop algorithms that treat space and geographical context as first class citizens. To consider what Geographic/Spatial Data Science is, we will highlight PySAL’s evolution over the last ten years, and review the domains currently covered in its feature set, including exploratory analysis of several types of spatial data (e.g. lattice, points, networks), modelling (e.g. spatial econometrics, geographically weighted regression, spatial hierarchical models), and visualization. We will also learn about how PySAL is integrated in the greater ecosystem of Python tools for data science, such as (geo)pandas, matplotlib, or scikit-learn. Throughout this whirlwind tour, participants with a connected device able to run a modern browser will be able to follow along interactively and get a taste of what is possible to do with PySAL.

April 16

Hannes Taubenböck, German Aerospace Center

Presentation title TBA

April 23

Julie Bessac, Assistant Computational Statistician Argonne National Laboratory

Presentation title TBA

May 14

Victoria Romeo Aznar, Mansueto Institute Postdoctoral Fellow & Evolution and Ecology Postdoctoral Scholar

Presentation title TBA

May 21

Amanda Lenzi, Postdoctoral Scholar, Argonne National Labs

Presentation title TBA

May 28

Amanda Stathopoulos,  Assistant Professor of Civil and Environmental Engineering, Northwestern University

Presentation title TBA