Over the course of the 2018-2019 season, a pattern of negative side-bias was statistically observed across that year’s topics. The 2020 January-February topic provides an opportunity to ascertain if the negative side-bias continues to exist in debate.
2020 January-February Data Set
Affirmative and negative ballots were gathered from the 18 Tournament of Champions bid-distributing tournaments on the January-February topic with results posted on tabroom.com as of writing this article: Blake, College Prep, Strake Jesuit, Newark, Peninsula, University of Houston, University of Puget Sound, Arizona State University, Sunvitational, Winston Churchill, Harvard-Westlake, Lexington, Durham Academy, Lewis & Clark, Emory, Columbia, Golden Desert, and Colleyville Heritage. These qualifier tournaments range from octo-final to final bid level. This data set has a sample size of 4,900 ballots representing fairly diverse debating and judging styles.
When all posted ballots on the January-February topic are analyzed, the negative won 52.37% of ballots. Now the question is whether the difference between the actual (52.37%) and what would be expected (50%) is statistically significant, or due to chance. In order to calculate a p-value to determine the answer, a one-proportion z-test was used. The null hypothesis was set to p = 0.5 (where p is the proportion of negative wins) since it is expected, barring any bias, that the affirmative and the negative would win the same number of times. The alternative hypothesis was p > .5. The alpha is set at 0.05 . The z-test rejects the null hypothesis (p-value < 0.001, 95% confidence interval [51.0%, 53.8%]). This implies there is less than a 0.1% chance that the proportion of negative wins observed could occur if the rounds are unbiased. This implies there is a negative side-bias.
Adjusting for Skill Differentials
We can further characterize the side bias by taking into account the difference in the skill of each debater. The previous analysis assumes that each debater has an equal chance of winning; the following analysis develops a more robust model that estimates the probability that each debater wins based on their respective skill level; rounds in which the affirmative debater is stronger are more likely to result in affirmative than negative wins. For a more robust account of debater skill differences, this study implemented an Elo rating system. This system rewards debaters more for defeating higher skilled debaters than when defeating less skilled debaters. Each debater starts with a rating of 1500, then as they win or lose rounds, their rating changes depending on the round difficulty. For example, if a 1500 rated debater loses to a 2000 rated debater, their rating would drop 2 points, while if they won their rating would rise 28 points. Each debater’s Elo modulates over the rounds they have. For the purposes of calculating Elo ratings for every debater, rounds were gathered from 142 TOC bid-distributing tournaments from 2017-2020 (YTD) with round results posted on tabroom.com.
To further quantify the 2020 January-February side-bias, the proportion of negative wins when the affirmative was favored (p1) can be compared with the proportion of affirmative wins when the negative is favored (p2). These proportions demonstrate a particular side’s ability to beat a higher ranked debater. The larger proportion would demonstrate a skew because a debater overcomes the disadvantage generated by debating a better debater at a higher rate on one side versus the other. Ideally, the difference between the proportions would be 0 indicating no bias; however, p1 = 33.39% while p2 = 29.19%: a 4.2% difference. In order to determine whether this difference is statistically significant, a two-proportion z-test was used. The null hypothesis is p1 – p2 = 0, because that means both sides are able to overcome the debating level skew equally. The alternative hypothesis is then p1 – p2 > 0, meaning the negative is able to overcome the skew more than the affirmative, demonstrating a side-bias. This two-proportion z-test rejected the null hypothesis (p-value < 0.01). This implies there is less than a 1% chance that the negative and the affirmative are equally able to overcome the skew produced by debating level differences if the rounds are unbiased. There is sufficient evidence that the negative is able to overcome this skew more often than the affirmative. This further indicates negative side bias.
This analysis is statistically rigorous and relevant in several aspects: (A) The p-value is less than the alpha. (B) The data is on the current January-February topic, meaning it’s relevant to rounds these months . (C) The data represents a diversity of debating and judging styles across the country. (D) This analysis accounts for disparities in debating skill level. (E) Multiple tests validate the results.
It is also interesting to look at the trend over multiple topics. In the rounds from 142 TOC bid-distributing tournaments (September 2017 – 2020 YTD), the negative won 52.75% of ballots (p-value < 0.0001, 95% confidence interval [52.3%, 53.2%]). This suggests the bias might be structural, and not topic specific, as this data spans nine different topics .
Given a structural advantage for the negative, the affirmative may be justified in being granted a substantive advantage to compensate for the structural skew. This could take various forms such as granting the affirmative presumption ground, tiny plans, or framework choice. Whatever form chosen should be tested to ensure the skew is not unintentionally reversed.
Therefore, this analysis confirms that affirming is in fact harder again on the 2020 January-February topic. So, once again, don’t lose the flip!
Footnotes 0.05 is a common alpha level for studies, with 0.1 being the upper limit.
Lavrakas, P. J. (2008). Encyclopedia of survey research methods Thousand Oaks, CA: Sage Publications, Inc. doi: 10.4135/9781412963947  It is important to note that numbers presented in this article that use the 2020 January-February data set should only be used within the context of the 2020 January-February topic; debaters who attempt to extrapolate this data to future topics would be misrepresenting the intent of this article.  The data set and analyses that utilizes 2017-2020 tournaments could be extrapolated to future topics as it suggests a trend. If the activity structurally changed, then the data should not be extrapolated. For example, if the 1AR got an extra minute, then this data would not be indicative of those rounds because the underlying nature of a round changed.