The Internet as Quantitative Social Science Platform

Klaus Ackermann
October 31, 2017

With the large-scale penetration of the internet, for the first time, humanity has become linked by a single, open, communications plat- form. Harnessing this fact, we report insights arising from a unified internet activity and location dataset of an unparalleled scope and accuracy drawn from over a trillion (1.5x10^12) observations of end- user internet connections, with temporal resolution of just 15min over 2006-2012. We first apply this dataset to the expansion of the internet itself over 1,647 urban agglomerations globally. We find that unique IP per capita counts reach saturation at approximately one IP per three people, and take, on average, 16.1 years to achieve; eclipsing the estimated 100- and 60- year saturation times for steam- power and electrification respectively. Next, we use intra-diurnal internet activity features to up-scale traditional over-night sleep observations, producing the first global estimate of over-night sleep duration in 645 cities over 7 years. We find statistically significant variation between continental, national and regional sleep durations including some evidence of global sleep duration convergence. Finally, we estimate the relationship between internet concentration and economic outcomes in 411 OECD regions and find that the internet’s expansion is associated with negative or positive productivity gains, depending strongly on sectoral considerations. To our knowledge, our study is the first of its kind to use online/offline activity of the entire internet to infer social science insights, demonstrating the unparalleled potential of the internet as a social data-science platform.

Read the paper here

Back to Top

 

Regression Analysis for Misaligned Data

Guillaume Pouliot
November 7, 2017

What is the impact of environmental variables such as rainfall, soil quality, and pollution on economic outcomes such as employment, income, and education? Research on this question is often stymied by the misalignment problem: the locations of the environmental observations do not generally coincide with those of the economic observations. In this article, I study a class of regression problems with spatially correlated variables. This includes regression analysis with misaligned data. I introduce a quasi-maximum likelihood estimator as well as more robust companion methods which do not require specification of the regression error covariance. For both, I obtain new central limit theorems for spatial statistics, which are of independent interest. I propose computational strategies and investigate their performance. Simulations show that the methods I recommend, along with the asymptotic distribution theory I derive, yield more reliable estimates and confidence intervals than previously recommended approaches. In the reanalysis of two data sets, I find that these methods yield conclusions that differ quantitatively and qualitatively from published results.

Read the paper here

Back to Top

 

Spatial Constraints on Gerrymandering: A Practical Comparison of Methods

Jamie Saxon
November 14, 2017

In the half-century since the US Supreme Court asserted its dominion over legislative districting, it has recognized the harm of political gerrymandering but failed to provide relief for it. An earlier, extended period of successful Congressional action suggests reviving the legislative remedy. Historically, Congress required equipopulous, contiguous, and compact districts, but the formal definition of compactness has proven contentious. Does it matter? This paper presents a credible, quantitative framework for evaluating the implications of spatial constraints in districting reform. By implementing a flexible automated districting procedure with eighteen different definitions of compactness, I evaluate their practical implications: the seat shares that they imply for the two parties and for racial and ethnic minorities. On these grounds, the definitions are markedly consistent. The choice among compactness definitions need not be contentious.

Read the paper here
View the interactive map here

Back to Top

 

Analyzing credit card transaction data to improve financial inclusion

Michelle Thompson
November 28, 2017

Traditionally, social and economic mobility for low-income individuals and families has been tied to the financial institutions located near or in their neighborhoods. Advances in e-commerce may provide increased flexibility in mobility and access but is constrained due to credit scores, card limits, and debt burden.
The Financial Inclusion & Citizen Participation Project (FI & CPP) is a unique collaboration between the MasterCard Center for Inclusive Growth (MCIG) and New America’s Public Interest Technology team to examine public data and private sector anonymized and aggregated transaction data, and conduct field research to better understand patterns in credit card usage, debt levels, and accessibility in marginalized communities. FI & CPP will build unique profiles for financial inclusion in low-income communities by examining data patterns in: Chicago (IL), New Orleans (LA), St. Louis (MO), and Bronx County (NY).
Individual city profiles will yield recommendations that encourage policymakers, community advocates, and businesses to address financial inclusion while engaging the public and advancing new partnerships.

Back to Top

Population density, climate variables, and poverty synergistically structure spatial risk in urban malaria in India

Mauricio Santos Vega
December 5, 2017

The world is rapidly becoming urban and with a global population living in cities projected to double by 2050. This increase in urbanization poses new challenges for the spread and control of communicable diseases such as malaria. Particularly, urban settings create highly heterogeneous socio-economic and environmental conditions that can affect the transmission of vector-borne diseases dependent on human water storage and waste water management. Interestingly India, as opposed to Africa, harbours a mosquito vector, Anopheles stephensi, which thrives in the man-made environments of cities and acts as the vector for both Plasmodium vivax and Plasmodium falciparum, making the malaria problem a truly urban phenomenon. Here we address the role and determinants of within-city spatial heterogeneity in the incidence patterns of malaria. By combining sBoth statistical analyses and a phenomenological transmission model are applied to an extensive spatio-temporal dataset on malaria cases in the city of Ahmedabad (Gujarat, India) . A spatial pattern in malaria incidence is described that is largely stationary in time and consistent for the two his parasites. Malaria risk is then shown to be associated with socioeconomic indicators and environmental parameters, temperature and humidity. In a more dynamical approach perspective, an Inhomogeneous Markov Chain Model is used to predict malaria risk. Models that account for climate factors, socioeconomic level and population size exhibit the highest predictive skill. Our results show that climate forcing and socio-economic heterogeneity act synergistically at local scales on the population dynamics of urban malaria in this city. The stationarity of malaria risk patterns provides a basis for more targeted intervention, such as vector control, based on transmission ‘hotspots’.

Back to Top