This is the part 3 of a 3-part series: Intro to Statistics and Evidence by Eli Hymson. The aim of these articles is to equip debaters with some debate-applicable knowledge from the field of statistics. The list of subjects is nowhere near comprehensive but reflects a grab-bag of areas which have the potential to improve the quality of in-round evidence comparison and out-of-round research practices.
- Experimental Design and Modeling
Much of the science of statistics has been developed to analyze results of carefully designed controlled experiments in which researchers manipulate experimental conditions, treatment combinations, and the selection of experimental units to examine causal relationships without other noisy factors involved. Debaters rarely have access to, or need for, this type of research. The majority of quantitative work used by debaters comes from observational studies where the data was collected from a process over which the researcher had no control. The best we can do in these circumstances is try to use variation in observed features to explain variation in some other variable of interest. Enter regression analysis.
Regression models are ubiquitous in fields utilizing quantitative research. I will focus on the elements of regression modeling which might be important for debaters, but there is a rich background of theory involved I encourage you to learn about too. Let’s discuss multivariate regression, coefficients and standard errors, statistical significance, p-values, and reading regression tables.
- Multivariate regression
The goal of single-variable regression is to draw a line through a cloud of data points on the X-Y plane which best represents the linear relationship between the explanatory variable, X, and the outcome variable, Y. X is called a regressor. The slope and intercept of this line are chosen so as to minimize the sum of squared errors, i.e. the line tries to minimize the average distance by which it misses each point in the observed data. The slope and intercept parameters are sample estimates of the population’s parameters. What we want to do is obtain some measure of how closely these parameters estimated from the data represent the entire population without observing the entire population.
A simple linear regression estimates an equation of the form Yi = B0 + B1*Xi, where Y is the outcome variable, B0 is the intercept of the line, and B1 is the slope coefficient describing how a 1-unit change in X influences the expected change in Y. You can add many regressors which might be related to Y. Then our linear regression equation now looks like Y = B0 + B1*X1 + B2*X2 + … + Bp*Xp. Each B can be interpreted as follows: a one-unit increase in X is associated with an increase in Y of magnitude B, holding all other regressors constant. Even though I’ve described it as an increase in Y, B can be positive or negative depending on the association between Y and X. Note that these coefficients are NOT correlation coefficients. Correlation coefficients range between -1 to 1, only measure straight-line relationships between two variables, and cannot control for partial effects of other variables. Correlation analysis is a weak technique relative to regression analysis.
When working with multivariate regression, it is important to keep in mind what the researcher wants to accomplish. The researcher often wants to conduct inference about a regressor and its coefficient, meaning they want to draw statistically valid conclusions about X’s relationship with Y for the population. There are many assumptions which a regression model must satisfy in order for any inference to be considered valid. Some problems researchers often run into which you might read about in a study’s methodology section include multicollinearity/high variance inflation factors, patterns in residuals, and endogeneity (think confounding variables). You can read more about the Gauss-Markov assumptions and their violations elsewhere if you are so inclined.
Let’s talk about the ingredients of a research paper’s description of its regression analysis. The following table is taken from Kouba and Mysicka, “Should and Does Compulsory Voting Reduce Inequality?”, available here.
Alright, lots of numbers here. What if there’s some useful information in here, but the researchers did not bother to write a concise, easy-to-understand paragraph summarizing their findings in words for the debaters of the world to use in their cases? Let’s explain and make sense of what’s happening here, focusing only on Model I.
Let’s first look at some coefficients, specifically the coefficient next to Gini index. There is one number, -51.27, with some stars next to it, and then there’s another number below it (7.02) surrounded by parentheses. What do these mean? Since the Gini coefficient ranges from 0 to 1, the number -51.27 means that for every one-unit (.01) increase in a country’s Gini coefficient, we expect the percentage of invalid votes to decrease by about 0.5127, or half a percentage point. This is not a causal relationship, mind; this is only a measure of association. Generally, in the data set, districts with higher income inequality have lower levels of invalid voting, quantified by this coefficient and holding the effects of other variables constant. The number in parentheses, 7.02, is something called a standard error, which quantifies how variable we expect that -51.27 estimate would be if we ran this same regression for many different samples.
What about the stars? These help us determine whether the -51.27 coefficient is actually different from zero. In other words, we want to determine whether the relationship between invalid voting and the Gini index is statistically significant. In analyzing regression model output, we test the hypothesis that the coefficient is equal to zero. These tests are conducted separately for each regressor. Rejecting the null hypothesis means our data suggests the parameter truly does not equal zero for the population, i.e. the parameter is statistically significantly different from zero.
Statistical significance is tricky business with complicated theory, so stick with me. The coefficient divided by its standard error approximately follows the standard normal distribution for large samples. The standard normal distribution has a well-known feature that 95% of its mass is located within two standard deviations of the mean, which equals zero. That means only the 5% most extreme outcomes live more than two standard deviations away from zero. This is where p-values come into play. The p-value summarizes how far out into the extreme tails of the distribution our observed outcome falls. You may have noticed the tiny font in the bottom-left corner of the table. The first part says that p is less than .05 if a coefficient has one star next to it. Translation: if your coefficient has one star, then that value is further from the mean of zero for its distribution than at least 95% of all possible values. If your coefficient has three stars, that value is further away from zero than 99.9% of all possible values! So how far away from zero do we need our estimated parameter to be before we decide it truly isn’t zero? Researchers usually decide this in advance and typically use p-value thresholds of .05 and .01. Any values lower than these thresholds indicate we should reject the null hypothesis and conclude our data provides evidence that the population parameter is not zero.
Phew, you survived!
One last thing: standardized coefficients. All that means is that instead of measuring the expected change in Y for a one-unit change in X, we’re measuring the expected change for a one standard deviation change in X. Some people find this interpretation more useful.
When reviewing regressions analyses for debate purposes, make sure you review not only the coefficient estimate, but also whether it is statistically significant and at what threshold. Also keep in mind whether the model uses appropriate control variables to avoid lumping the effects of other factors in with the regressor the researcher is actually interested in.
This article does not, by any means, cover all of statistics, let alone all of statistics that debaters might find useful. The explanations of many terms and ideas will also not satisfy more curious debaters who may already have taken statistics classes. However, it covers an assortment of topics which might improve the quality of evidence comparison, cross-examination, and out-of-round research. Rather than resorting to flowery language and doomsday rhetoric in weighing impacts, statistical research allows debaters to precisely quantify and rigorously support impact scenarios, which I guarantee judges will be appreciative of and impressed by. Though this type of research may seem inaccessible and arcane at first glance, any debater who uses or encounters quantitative studies should understand how they function, which parts can be used most effectively, and where intimidating numbers and figures might gloss over fundamental errors in reasoning and design. Many topics will have a vast literature of empirical research supporting either side. Debaters should be equipped with the tools to intelligently discuss and defend their quantitative evidence against contrary positions just as they would any ethical philosophy text.
Eli holds a Master of Science degree in Statistics from Texas A&M University. He competed in Lincoln-Douglas debate for Stoneman Douglas HS in Parkland, Florida, reaching the TOC his senior year.