A Statistical Analysis of Side-Bias on the 2019 January-February Lincoln-Douglas Debate Topic

A Statistical Study of Side-Bias on the 2019 January-February Lincoln-Douglas Debate Topic

by Sachin Shah


Due to the strong interest in the September-October and November-December side-bias studies, a subsequent analysis of the 2019 January-February Lincoln-Douglas topic is merited to ascertain if the pattern of negative side bias continues to hold. Not only will this study avoid the potential technical pitfalls with statistical studies as outlined in the previous side-bias articles, it will utilize an additional new metric to account for debater skill disparities.


2019 January-February Side-Bias Analysis

Affirmative and negative ballots were gathered via tabroom.com from 18 Tournament of Champions bid-distributing tournaments on the January-February topic across the country: Blake, Strake Jesuit, College Prep, Newark, Arizona State University, University of Puget Sound, University of Houston, Winston Churchill, Peninsula, Harvard-Westlake, Lexington, Durham Academy, Lewis & Clark, Emory, Columbia, Colleyville Heritage, Golden Desert, and University of Pennsylvania. These tournaments range from octofinal to final bid level qualifier tournaments. This data set has a large sample size of 4,505 rounds and represents fairly diverse debating styles. These tournaments span the country from the west coast, where utilitarian rounds are more predominant, to the east coast, where philosophy rounds are more prevalent. A variety of judging styles is reflected among the tournaments.

When all posted ballots on the January-February topic are analyzed, the negative won 53.04% of ballots.  To test if this result is statistically significant, the null hypothesis was set to p = 0.5, where p is the proportion of negative wins.  The alternative hypothesis was set to p > 0.5.  In order to calculate a p-value to determine the answer, a one-proportion z-test was used.  As in the previous articles, the alpha is set at 0.01. The z-test rejected the null hypothesis in favor of the alternative hypothesis (p-value < 0.0001). This implies there is less than a 0.01% chance that the proportion of negative wins observed could occur if rounds are also unbiased, meaning there is a negative side-bias.


A More Robust Model

We can further characterize the side bias by taking into account the difference in the skill of each debater.  Our previous analysis assumes that each debater should have an equal chance of winning; the following analysis develops a more robust model that estimates the probability that each debater wins based on their respective skill level; rounds in which the affirmative debater is stronger are more likely to result in affirmative than negative wins. Previous analyses attempt to control for skill disparity between the two debaters by using rounds from octofinal and quarterfinal bid tournaments only, because those tournaments tend to attract more skilled debaters. Of the 1,853 ballots from Blake, College Prep, Harvard-Westlake, Lexington, and Emory tournaments, the negative won a statistically significant 54.24% of ballots (p-value < 0.0002). However this analysis arbitrarily sets the cutoff at the quarterfinal level and still includes debater skill disparities during preliminary rounds. Another common proposed method would be to use elimination round data only, however even then early elimination rounds include high 6-0 debaters hitting low 4-2 debaters – which means there exists an inherent skew. To include only later elimination rounds would also both necessitate selecting an arbitrary cut off and result in too small sample sizes.

For a more robust accounting of debater skill differences, this study implemented an Elo rating system. Each debater starts with a rating of 1500, then as they win or lose rounds their rating changes depending on the difficulty of the round. For example, if a 1500 rated debater loses to a 2000 rated debater, their rating would drop 1.597 points, while if they won their rating would rise 28.403 points. This makes sense because debaters should be rewarded for beating good debaters more than for beating worse debaters. Each debater’s Elo modulates over the rounds they have. For the purposes of calculating Elo, rounds were gathered from 93 TOC bid-distributing tournaments from 2017-2019 (YTD) with round results posted on tabroom.com [1].

A variety of metrics can now be used to quantify the side-bias. The most straightforward method is to use a technique called logistic regression. In this analysis, a function of the form

f(x) = 1/(1+e^(a(x-b)))

is found such that f (Elo_difference) is approximately the probability that the affirmative will win given that the difference between the affirmative and negative debaters’ Elo is Elo_difference . The parameters of this function were found so that the function best fit the data set described above. It was determined that the best parameters were a = –0.0112 and b = 12.35. The fact that the “offset” parameter b is 12.35 means that when the negative is 12.35 points worse than the affirmative, the round is an even matchup – i.e., the probability either debater wins is 50%. The model with an offset of 12.35 is e^54 times as likely than a model with an offset of 0. This offset means there is a negative side-bias because they are more likely to win even when the affirmative is the better debater.

Another way to quantify the side-bias is to examine only rounds where debater with the lower Elo rating won, indicating an upset occurred. Theoretically the upsets should be equally distributed between upset affirmative wins and upset negative wins. In the 1,399 upset rounds across tournaments in the data set, the negative won a statistically significant 55.11% of those rounds (p-value < 0.0001). This percentage demonstrates that the negative is able to overcome the disparity produced when the affirmative is slated to win more often than the affirmative is able to overcome the disparity produced when the negative is slated to win. Thus negating is easier because they can overcome debater skill level disparity more often, meaning side-bias indeed exists regardless of this confounding variable.

To further quantify the side-bias, the proportion of negative wins when the affirmative was favored (p1) can be compared with the proportion of affirmative wins when the negative is favored (p2). Ideally the difference between the proportions would be 0; however,  p1 = 34.84% while  p2 = 28.77, a staggering 6.07% difference. Now the question is whether this difference is statistically significant. In order to determine the answer, a two-proportion z-test was used. The null hypothesis is p1 – p2 = 0 , because that means both sides are able to overcome the debating level skew equally. The alternative hypothesis is then p1 – p2 > 0, meaning the negative is able to overcome the skew more than the affirmative is able, demonstrating a side-bias. This two-proportion z-test rejected the null hypothesis in favor of the alternative (p-value < 0.0001). There is sufficient evidence that the negative is able to overcome the skew more often than the affirmative can. This implies there is a less than 0.01% chance that there is no side-bias because it demonstrates the higher proportion of negative wins when the affirmative is favored is significant. In short, the negative has a greater ability to win difficult rounds than the affirmative does, which indicates there exists a skew in the negative’s favor.

This analysis is statistically rigorous and relevant in several aspects: (A) The p-value is less than the alpha. (B) The data is on the current January-February topic, meaning it’s relevant to rounds these months [2]. (C) The data represents a diversity of debating and judging styles across the country. (D) This analysis accounts for disparities in debating skill level. (E) Type I error was reduced by choosing a small alpha level. The combination of these points validates this analysis.

As a final note, it is also interesting to look at the trend over multiple topics. In the rounds from 93 TOC bid distributing tournaments (2017 – 2019 YTD), the negative won 52.99% of ballots (p-value < 0.0001) and 54.63% of upset rounds (p-value < 0.0001). This suggests the bias might be structural, and not topic specific, as this data spans six different topics.

Therefore, this analysis confirms that affirming is in fact harder again on the 2019 January-February topic [3]. So don’t lose the flip!



Sachin Shah is a senior at Lake Highland Preparatory School in Orlando, FL, who is currently qualified to the 2019 Tournament of Champions. Outside of debate, he participates in robotics and lab research. He often enjoys solving Rubik’s cubes and programming challenges in his spare time.

[1] Tournaments were only excluded if round results were not posted on tabroom.com at the time of writing this article. Here is a full list of tournaments during the 2017-2018 season: Grapevine (TX), Loyola (CA), University of Kentucky (KY), Yale (CT), Greenhill (TX), Valley (IA), Holy Cross (LA), CSU Long Beach (CA), Presentation (CA), Bronx Science (NY), St Marks (TX), Saint James (AL), Heritage Hall (OK), Beltway/Capitol Classic (MD), Meadows (NV), Notre Dame (CA), Apple Valley (MN), Scarsdale (NY), Crestian Tradition (FL), Middleton (WI), Central Valley (WA), Glenbrooks (IL), Alta (UT), Princeton (NJ), University of Texas (TX), Isidore Newman (LA), Dowling Catholic (IA), Blake (MN), Strake Jesuit (TX), College Prep (CA), Newark (NJ), University of Puget Sound (WA), Arizona State University (AZ), Winston Churchill (TX), Columbia (NY), Harvard-Westlake (CA), University of Houston (TX), Lexington (MA), Peninsula (CA), Emory (GA), Colleyville Heritage (TX), Golden Desert (NV), University of Pennsylvania (PA), Stanford (CA), Millard North (NE), Berkeley (CA), Harvard (MA), University of Southern California (CA), Tournament of Champions.

Here is a list of tournaments from the 2018-2019 season thus far: Grapevine (TX), Loyola (CA), University of Kentucky (KY), Greenhill (TX), Yale (CT), Valley (IA), CSU Long Beach (CA), Holy Cross (LA), Presentation (CA), Bronx Science (NY), St. Mark’s (TX), Heritage Hall (OK), Meadows (NV), University of Southern California (CA), Apple Valley (MN), Notre Dame (CA), Cypress Bay (FL), Middleton (WI), Scarsdale (NY), Glenbrooks (IL), Alta (UT), Princeton (NJ), University of Texas (TX), Dowling Catholic (IA), Ridge (NJ), Isidore Newman (LA), Blake (MN), Strake Jesuit (TX), College Prep (CA), Newark (NJ), Arizona State University (AZ), University of Puget Sound (WA), University of Houston (TX), Winston Churchill (TX), Peninsula (CA), Harvard-Westlake (CA), Lexington (MA), Durham Academy (NC), Lewis & Clark (OR), Columbia (NY), Emory (GA), Colleyville Heritage (TX), Golden Desert (NV), University of Pennsylvania (PA).

[2] It is important to note that numbers presented in this article that use the January-February data set should only be used within the context of the 2019 the January-February topic; debaters who attempt to extrapolate this data to future topics would be misrepresenting the intent of this article. The data set that utilizes 2017-2019 tournaments could be extrapolated to future topics as it suggests a trend.

[3] The data analysis presented in this article has been reviewed and authenticated by two AP Statistics instructors.

Grant Brown