A Statistical Analysis of Side-Bias on the 2018 September-October Lincoln-Douglas Debate Topic by Sachin Shah

A Statistical Analysis of Side-Bias on the 2018 September-October Lincoln-Douglas Debate Topic



“Affirming is harder because of the negative side bias from the 2015 TOCs or late elim rounds.” This argument in countless affs is used as justification of pretty much anything and everything. Unfortunately, these statistics are easily defeated with a little statistical knowledge. For example, a debater could point out that the statistic referenced has no p-value, or that the data is simply outdated and irrelevant. Both of these points sufficiently beat back this side bias claim. This article hopes to prove affirming on the 2018 September-October topic is indeed harder than negating by presenting a statistically sound analysis demonstrating the negative side bias.


Typical Problems With The Use of Side Bias Statistics

The first, and arguably the most intuitive, problem deals with time period. Most common statistics are from 2015 or earlier, but debaters do not explain why that data is still relevant given that debating trends have drastically changed over recent years. For example, Ks are more prevalent now, and disclosure is a newly prevalent norm. These factors can affect how rounds play out, making older numbers not necessarily reflective of the current side bias.

The second problem is with the sample size. Only looking at a single tournament or a small set of rounds does not prove a true bias that can be generalized with any confidence, as there are often lurking hidden variables. For example, local tournaments might have an affirmative bias due to lay judges remembering more of the 2AR than the 2NR. This indicates that unless the data includes a large sample size of various debating styles, it is unlikely to accurately reflect an overall side bias.

The third problem is that many of these statistics do not include a probability value (p-value). The p-value is between 0 and 1 and gives the probability of what occurred in the sample happening given the null hypothesis. This means the lower the p-value, the stronger the data indicates to reject the null hypothesis. However, simply including a p-value does not solve this problem. The p-value must also be less than a chosen Alpha value (in most fields the Alpha is chosen to be 0.05.) Any p-value above the Alpha are considered to be not statistically significant. For the purposes of this article, the Alpha shall be selected to be 0.01 to strengthen the significance of the analysis.


2018 September-October Side Bias Analysis

The number of affirmative and negative ballots was gathered from five TOC-qualifier tournaments across the country: Loyola Invitational, Grapevine Classic, Greenhill Fall Classic, Yale Invitational and Mid America Cup tournaments via tabroom.com. All of these tournaments are either quarterfinal or octofinal bid level tournaments. There were 2,293 ballots available on tabroom.com after these five tournaments concluded. This is a sufficiently large sample in which a 50-50 distribution of affirmative and negative ballots would be expected.

However, when the total number of available ballots is analyzed, the negative won 55.17% of ballots. This data set has a large sample size, and represents fairly diverse debating styles. Now the question is whether the difference between 55.17% and what would be expected (50%) is statistically significant, or due to chance. In order to calculate a p-value to determine the answer, a one-proportion z-test was used. The null hypothesis was set to p = .5 since it is expected, barring any bias, that the affirmative would win just as many times as the negative would. The alternative hypothesis was p ≠ .5, where p is the proportion of negative ballot wins. The z-test rejected the null hypothesis in favor of the alternative hypothesis (p-value < 0.0001). This implies there is less than a 0.01% chance that the proportion of affirmative and negative wins is unbiased.

Although, this data seems to answer most of the common arguments against side bias statistics, there is a potentially important factor: the data set includes quality disparity that occurs during prelim rounds. Some prelim rounds are not “tech” focused, meaning there are confounding variables such as judge’s better recollection of the 2AR or resources available to debaters.

To solve for this concern, preset rounds could be removed from analysis. Considering just the 1,963 non-preset rounds in the data set, the negative won 56.29% of ballots (p- value < 0.0001). Although this shows a slightly stronger bias, considering only elimination rounds, and not ANY prelim rounds, could ameliorate the potential factors described above. In the 555 elimination rounds across tournaments in the data, the negative won a statistically significant whopping 61.08% of ballots (p- value < 0.0001). This percentage demonstrates that side bias indeed exists, in spite of criticisms levied at other studies lacking statistical rigor in their analysis.

This analysis is statistically rigorous and relevant is several respects: (A) p-value is less than the Alpha. (B) This is on the current September-October topic, meaning it’s relevant to rounds this month.[1] (C) This includes diversity of debating styles. (D) This accounts for the debating level of the participants. The combination of these points demonstrates the validity of this analysis.

These graphs show the extent of neg side bias. The first graph illustrates that neg side bias is pervasive across all rounds at each tournament, ranging from just under 2% to almost 8% variance from an unbiased distribution. The second graph illustrates that the problem is exacerbated in elimination rounds; it shows the heightened neg side bias in elimination rounds across tournaments, ranging from just over 6% to just over 16%. This is a startling variance from an unbiased distribution.

As a side note, there is the case of the non-preset rounds at one tournament (Greenhill) in which the negative won exactly 50% of the ballots. Although this example could be interpreted as a counterexample, in fact this case illustrates the pitfalls of cherry picking data. This single result could be explained by the small number of homogenous ballots.

Therefore, this analysis confirms that affirming is in fact, actually, much harder on the September-October topic this year.[2] So don’t lose the flip!


Sachin Shah is a senior at Lake Highland Preparatory School in Orlando, FL, who is currently qualified to the 2019 Tournament of Champions. Outside of debate, he participates in robotics and lab research. He often enjoys solving Rubik’s cubes and programming challenges in his spare time.

[1] It is important to note that these numbers should only be used with the context of the 2018 September- October topic; debaters who attempt to extrapolate this data to future topics would be misrepresenting the intent and conclusions of this article.

[2] The data analysis presented in this article has been reviewed and authenticated by two AP Statistics instructors.

Grant Brown