CSDS Study Group Presentations: 2020

Winter Quarter 2020

Qinyun Lin
January 14, 2020

It is very rare that education studies have constant intervention effects to independent individuals. It is well-documented that schooling is a complex process because teachers, students, and administrators interact with each other in a diverse set of social contexts. As such, considering potential bias due to unobserved or uncontrolled spillover and heterogeneity is important to making an inference for policy implications. Additionally, since the ultimate goal of education research is to inform decision-makings in the allocation of educational resources regarding curricula, pedagogy, practices or school organizations, education research must be accessible to practitioners. Consequently, a sensitivity framework that can account for all potential sources of bias, including spillover and heterogeneity, is required to allow all stakeholders to conceptualize the quality of evidence independently so that the debate for future policy manipulations can take place in a more transparent, effective and equitable way.

This presentation will first review the case replacement approach by Frank, Maroulis, Duong, and Kelcey (2013). The authors utilized Rubin’s causal model to interpret the bias necessary to invalidate an inference in terms of sample replacement. (See https://jmichaelrosenberg.shinyapps.io/konfound-it/ for a Shiny interactive web application and https://jrosen48.github.io/konfound/ for an R package for sensitivity analysis.) Then I will present how the non-parametric case replacement approach can be extended to quantify the robustness of inference in multisite randomized control trials and value-added measures for teacher effectiveness, accounting for spillover and heterogeneity. Throughout, the Tennessee class size experiment (Project STAR) is applied to demonstrate the case replacement approach.

Robert Manduca
January 21, 2020

Income inequality has risen dramatically in the United States over the last 40 years. This has been accompanied by a growing concern that many people are being economically "left behind." Yet standard measures of the proportion of people who left out of the economy, such as the absolute and relative poverty rates, have not increased during this time. I argue that this discrepancy arises because standard poverty measures are poorly equipped to identify the full extent of economic exclusion generated by today's economy. In particular, they are not able to identify the economic exclusion generated by two major economic trends of recent decades: the concentration of purchasing power among the very rich and the growth of geographic income disparities. I propose a measure of economic exclusion, benchmarked to the income at which the median dollar is earned rather than that of the median person, that is sensitive to these developments. Trends in economic exclusion as defined by this measure diverge sharply from trends in poverty over the last 40 years, showing much larger increases in economic exclusion over time and higher rates of exclusion in high income regions.

Pedro Amaral
January 28, 2020

The Brazilian Constitution of 1988 gave origin to the Brazilian Public Health System – SUS. The main objective of this health system is to guarantee free access at the point of delivery to health services for all Brazilian citizens, with broad coverage of their needs, and equal care and treatment to people with equal needs, i.e. horizontal equity. Over its 30 years of existence, SUS has had a significant impact on reducing health inequalities in Brazil. However, several imbalances and equity issues still remain. The More Doctors Program (PMM), a strategy adopted by the Brazilian Ministry of Health in 2013 to provide doctors in vulnerable regions and cities, had a significant contribution in this regard, until the termination of new contracts by the program in 2019. The program’s results show, fundamentally, increased accessibility, reduction of regional health inequalities, greater equality in the distribution of doctors, as well as reduction in hospitalizations for primary health care sensitive causes, increased prenatal and postpartum consultations, among others.

In this context, in this talk I will present research that I have developed with other colleagues to evaluate the spatial distribution of health care services in Brazil, including research on the spatial distribution of health facilities, equipment, and staff. I will also present some results on the allocation of new investment in health equipment and health teams. Additionally, I will show some recent research developed on regional aspects of the recent Zika virus spread and microcephaly outbreak in Brazil.

Nick Feamster
February 4, 2020

Broadband Internet access networks play a uniquely important role in connecting people to the Internet. Access network technologies—wireless and cellular access networks, as well as fixed-line broadband access networks—continue to develop rapidly. As new technologies proliferate, access networks are getting faster, at least on paper and in the lab. But, fast access network technology is far from the end of the story. To ultimately have positive effects, access networks must perform well in deployment, from the basic speed they deliver to the quality of the applications that they can support. The lack of high-speed Internet access in certain neighborhoods can create gaps in opportunity—one example of this phenomenon being what the Federal Communications Commission calls the “Homework Gap”. Spatial data about the state of high-speed Internet access (and application performance on these networks) across neighborhoods and regions ultimately affects decisions about infrastructure investment and policy. Unfortunately, today’s spatial data on broadband Internet access is inaccurate, outdated and coarse-grained, leading to a poor understanding of the status of deployment. In this talk, I will discuss our technical results on measuring Internet access networks in hundreds of homes in more than 30 countries over the past decade, including a more recent project with the Wall Street Journal to understand application performance in access networks. I will then present a grand challenge: using and deploying the tools we have developed to map the state of broadband Internet access—speed, application performance, and price—on a much finer spatial and temporal granularity than exists today.

Chris Graziul and Ziwen Chen
February 11, 2020

Most spatial analysis is conducted under some level of uncertainty regarding key characteristics of relevant features. However, this uncertainty is often constrained in ways we often take for granted. For example, administrative boundaries are routinely employed as part of spatial research despite the fact that (a) they may not align with the social processes under study nor the scales on which these processes unfold and (b) these boundaries can/do change over time (e.g. Census data). Yet there is a certain level of comfort with making reasonable assumptions, or otherwise estimating the potential impact of data errors, that leads to some level of confidence that analysis involving administrative boundaries is "grounded" in reality.

What if the "ground truth" of these features were truly unknown? That is, what if we were told to study friendship formation in the City of Chicago using an assortment of spatial, demographic, and survey data, but had no concept of what a city is or does, what its inhabitant want or need, or what "friendship" means? Our project has been focused on conducting just such a study using limited data provided from an unknown data generating process. Told that we can learn more about this nameless, closed urban world through research requests filtered through this data generating process, we have been tasked with illuminating different aspects of social processes without the benefit of data foundational to conducting social scientific research. In fact, our world has no such concept of race, nor religion, nor even, to our knowledge, healthcare. How do you efficiently and effectively analyze individuals in an urban environment when this environment is clearly unrealistic yet supports real social structures that evolve over time? We will present this unique problem, our challenges, and our remaining questions as a useful heuristic tool for approaching spatial analysis of social processes "in the dark" under radical uncertainty about what drives individual motivations and behaviors.

For more information about DARPA's Ground Truth program please visit: https://www.darpa.mil/program/ground-truth

Thomas Coleman
February 18 and October 13, 2020

John Snow, the London doctor often considered the father of modern epidemiology, recognized that two competing water companies in 1850s South London, serving an area with over 400,000 individuals, provided a “Grand Experiment” for testing the effect of clean versus dirty water in the transmission of cholera. Snow exploited the randomization and applied a rudimentary form of Difference-in-Differences (DiD) to compare dirty-versus-clean and before-versus-after. This paper extends Snow's analysis using modern statistical tools. The nature of Snow's design allows us to examine and exploit within-sample variability both across and within regions. The conclusion is stark: a naive calculation that ignores the within-sample variation (“overdispersion” relative to a Binomial or Poisson error process) substantially over-states statistical significance of the estimated treatment effect. In the end Snow's claim for the “influence which the nature of the water supply exerted over the mortality” survives, but is less overwhelming than we would naively think. This re-analysis shows the importance of careful error analysis, and provides a valuable example of using non-experimental data.

Resources

Robert Roth
February 25, 2020

Advances in personal computing and information technologies have fundamentally transformed how maps are produced and consumed, as many maps now are highly interactive and delivered online or through mobile applications. Today, professionals and students alike must be able to both encode geographic information based on sound cartographic principles as well as code a useful and usable interface for exploring the resulting maps. While much of the empirical research in cartography over the past half century has addressed representation design, evaluating the graphic symbols employed to communicate meaning in geographic information, relatively few empirical studies in cartography approach interaction design, research that is needed to understand how to successfully create digital mapping interfaces that meet user needs and promote geographic understanding.

My research seeks principles and techniques for designing better interactive maps. In this presentation, I first make the case for cartography specifically and design thinking broadly in the ever-widening discipline of spatial data science by highlighting a series of recent developments within the profession. I then provide an overview of my theoretical contributions to interactive cartography, introducing a composite taxonomy of interaction primitives or the basic building blocks of interaction design that mirror the visual variables in representation design. I then break down the user-centered design and development process we follow for interactive mapping projects at the University of Wisconsin Cartography Lab. I conclude with discussion of case studies in environmental justice, climate change communication, and paleoecology to demonstrate how we put theory into practice to design better interactive maps.

Jamie Saxon
March 3, 2020

City planners have a professional and ethical responsibility to provide public goods equitably. Parks improve mental and physical health by nurturing social cohesion and enabling physical activity. So who gets parks? Park access has traditionally been evaluated using constructed variables of potential access: distance buffers or gravity models. These models have major limitations: they ignore commutes and other more intricate mobility behaviors. To address these issues, I propose a nationally scalable, empirical measure of realized use. Using a dataset of smartphone locations, I identify visits to parks in the twenty largest American cities. I use these data to calibrate existing models, and then contrast the models with realized use. The traditional models are not simply imprecise; they systematically over-estimate realized access by minority populations. In other words, they understate inequity. On the other hand, the new data come with substantial challenges. They are a convenience sample, biased towards wealthier, whiter populations. While these biases appear to be moderate, continued work with these and similar data will require continued attention to the sample frame.

Daniel Arribas-Bel
April 2, 2020

This workshop will introduce the nascent field of Geographic(/Spatial) Data Science through the vision, structure, and features of PySAL. PySAL is a Python (federation of) package(s) for geocomputation and spatial/geographic data science. Against the backdrop of Open Science and reproducibility, we will first briefly review the library’s history and how it connects with that of data science. This will highlight the original goals of providing a shared and open platform to develop algorithms that treat space and geographical context as first class citizens. To consider what Geographic/Spatial Data Science is, we will highlight PySAL’s evolution over the last ten years, and review the domains currently covered in its feature set, including exploratory analysis of several types of spatial data (e.g. lattice, points, networks), modelling (e.g. spatial econometrics, geographically weighted regression, spatial hierarchical models), and visualization. We will also learn about how PySAL is integrated in the greater ecosystem of Python tools for data science, such as (geo)pandas, matplotlib, or scikit-learn. Throughout this whirlwind tour, participants with a connected device able to run a modern browser will be able to follow along interactively and get a taste of what is possible to do with PySAL.

Autumn Quarter 2020

Niall Atkinson and Carmen Caswell
Oct. 20, 2020

At a time before the advent of a rationalized system of numbered addresses, people in cities understood the places in which they lived their lives as a network of integrated spatial and social relationships between streets, people, institutions, and activities. this was no less true in the case of the first “modern” tax census carried out in Florence, in 1427. Known as the catasto, this massive experiment in developing a demographic portrait of the city required each household to declare where they stood, literally, in relation to the state and their immediate neighbor. By processing these relational stems of address, digital technologies now allow us the ability to build a social map of every Florentine household in the city at a moment when the city precisely at a moment when the city was transforming, experimenting with, and inventing forms of cultural production, economic innovation, and political practices that have had lasting effects on the history of the west. And the visualization of such a map will help us to understand the way in which Florentines understood their collective identity, who they were, as a function of where they were: where they lived, where they worked, where they prayed, and even where they died.

Ken Frank
Nov. 10, 2020

We compare the model of social network influence with the spatial model. In particular, we compare the agency of the actor to choose specific ties in network data with implied distances in dictated by location in spatial data. This has implications for controlling for the selection process in estimating effects. It also has implications for accelerating polarization when influence and selection reinforce one another.

Austin Wright and Konstantin Sonin
Nov. 17, 2020

Classic and modern theories of asymmetric warfare emphasize the role of combat tac-tics rebels employ against better equipped government forces. In our model, rebels acquire information about government vulnerabilities and calibrate the timing of their attacks. We test implications of the model using highly detailed data about Afghan rebel attacks and U.S.-led counterinsurgent operations as well as previously unreleased military information about rebel-led spy networks. Leveraging quasi-random variation in revenue from the opium trade, we fi nd a robust link between local economic shocks and the patterns of rebel attacks, which is significantly enhanced in areas where rebels spy on and in filtrate military bases. Shortages of rebel fighters and increases in government surveillance operations reduce attack clustering. Finally, we present the first evidence that the clustered timing of rebel attacks undermines soldier efficiency, leading to an increase in bomb-related casualties to government troops.

Milena Almagro
Dec. 1, 2020

This paper argues that the endogeneity of amenities plays a crucial role in the welfare distribution of a city's residents by reinforcing location sorting. We quantify this channel by leveraging spatial variation in tourism flows and the entry of home-sharing platforms, such as Airbnb, as shifters of location characteristics to estimate a dynamic model of residential choice. In our model, consumption amenities in each location are the equilibrium outcome of a market for services, which are supplied by firms and demanded by heterogeneous households. We estimate the model using detailed Dutch microdata, which allows us to track the universe of Amsterdam's residents over time and the evolution of a rich set of neighborhood amenities. Our results indicate significant heterogeneity across households in their valuation of different amenities, as well as in the response of amenities to demographic composition. We show that allowing for this endogenous response increases inequality between demographic groups whose preferences are closely aligned, but decreases it if substantially misaligned, suggesting heterogeneity in the two-way mapping between households and amenities plays a crucial distributive role. Finally, we highlight the distributional implications of our estimates by evaluating currently debated policies, such as zoning, as well as price and quantity regulations in housing markets.