A Statistical Analysis of Side-Bias on the 2019 November-December Lincoln Douglas Debate Topic by Sachin Shah

Over the course of last year, a pattern of negative side-bias was statistically observed across those topics [1]. The first topic of this season showed with no significant bias, a change from last year [2]. The 2019 November-December topic provides an opportunity to ascertain if the negative side-bias has truly been eliminated, or if it continues to lurk in debate.

2019 November-December Data Set

Affirmative and negative ballots were gathered via tabroom.com from the 6 Tournament of Champions bid-distributing tournaments on the November-December topic with results posted as of writing this article: Florida Blue Key, Notre Dame, Apple Valley, Scarsdale, Cypress Bay, and Middleton. These qualifier tournaments range from octo-final to final bid level. This data set has a sample size of 1,840 ballots, representing fairly diverse debating and judging styles.

One-Proportion z-test

When all posted ballots on the November-December topic are analyzed, the negative won 52.45% of ballots. Now the question is whether the difference between the actual (52.45%) and what would be expected (50%) is statistically significant, or due to chance. In order to calculate a p-value to determine the answer, a one-proportion z-test was used. The null hypothesis was set to p = 0.5 (where p is the proportion of negative wins) since it is expected, barring any bias, that the affirmative and the negative would win the same number of times. The alternative hypothesis was p > .5. The alpha is set at 0.05 [3]. The z-test rejects the null hypothesis (p-value < 0.02, 95% confidence interval [50.2%, 54.7%]). This implies there is less than a 2% chance that the proportion of negative wins observed could occur if the rounds are unbiased. This implies there is a negative side-bias.

Adjusting for Skill Differentials

We can further characterize the side bias by taking into account the difference in the skill of each debater. The previous analysis assumes that each debater has an equal chance of winning; the following analysis develops a more robust model that estimates the probability that each debater wins based on their respective skill level; rounds in which the affirmative debater is stronger are more likely to result in affirmative than negative wins. A common proposed method to address this concern is to use a limited sample set, such as only elimination rounds or only octo-final and quarterfinal bid tournaments. In the 471 elimination ballots (triple-octo-finals through finals), the negative won 54.99% of ballots (p-value < 0.02). In the 2 octo-final / quarterfinal bid tournaments in the data set, the negative won 53.36% of ballots (p-value < 0.015). Both of these subsets conclude there is a negative side-bias. Because there have only been 2 octo-final / quarterfinal bid tournaments on this topic so far, the sample size might be considered to be too small for useful conclusions. However because it excludes more “lay” tournaments, it could more accurately represent varsity debate as it reduces confounding variables such as judge expertise. Because this topic generally has fewer tournaments than the other topics, the smaller sample size is expected.

A More Robust Model

For a more robust account of debater skill differences, this study implemented an Elo rating system. This system rewards debaters more for defeating good debaters than when defeating worse debaters. Each debater starts with a rating of 1500, then as they win or lose rounds, their rating changes depending on the round difficulty. For example, if a 1500 rated debater loses to a 2000 rated debater, their rating would drop 1.597 points, while if they won their rating would rise 28.403 points. Each debater’s Elo modulates over the rounds they have. For the purposes of calculating Elo ratings for every debater, rounds were gathered from 117 TOC bid-distributing tournaments from 2017-2019 (YTD) with round results posted on tabroom.com.

To further quantify the 2019 November-December side-bias, the proportion of negative wins when the affirmative was favored (p1) can be compared with the proportion of affirmative wins when the negative is favored (p2). These proportions demonstrate a particular side’s ability to beat a higher ranked debater. The larger proportion would demonstrate a skew because a debater overcomes the disadvantage generated by debating a better debater at a higher rate on one side versus the other. Ideally, the difference between the proportions would be 0 indicating no bias; however, p1 = 37.2% while p2 = 33.42%: a 3.78% difference. In order to determine whether this difference is statistically significant, a two-proportion z-test was used. The null hypothesis is p1p2 = 0, because that means both sides are able to overcome the debating level skew equally. The alternative hypothesis is then p1p2 > 0, meaning the negative is able to overcome the skew more than the affirmative, demonstrating a side-bias. This two-proportion z-test rejected the null hypothesis at an alpha level of 0.1 (p-value < 0.065). This implies there is less than a 6.5% chance that the negative and the affirmative are equally able to overcome the skew produced by debating level differences if the rounds are unbiased. There is sufficient evidence that the negative is able to overcome this skew more often than the affirmative. This further indicates negative side bias.

Conclusion

This analysis is statistically rigorous and relevant in several aspects: (A) The p-value is less than the alpha. (B) The data is on the current November-December topic, meaning it’s relevant to rounds these months [4]. (C) The data represents a diversity of debating and judging styles across the country. (D) This analysis accounts for disparities in debating skill level. (E) Multiple tests confirm the results.

As a final note, it is also interesting to look at the trend over multiple topics. In the rounds from 117 TOC bid-distributing tournaments (September 2017 – 2019 YTD), the negative won 52.88% of ballots (p-value < 0.0001). This suggests the bias might be structural, and not topic specific, as this data spans eight different topics [5].

Therefore, this analysis confirms that affirming is in fact harder again on the 2019 November-December topic. So, once again, don’t lose the flip!

Footnotes

[1] Last year’s side-bias studies

Shah, Sachin. “A Statistical Analysis of Side-Bias on the 2018 September-October Lincoln-Douglas Debate Topic by Sachin Shah.” NSD Update. October 11, 2018. http://nsdupdate.com/2018/a-statistical-analysis-of-side-bias-on-the-2018-september-october-lincoln-douglas-debate-topic-by-sachin-shah/.

Shah, Sachin. “A Statistical Analysis Of Side-Bias On The 2018 November-December Lincoln-Douglas Debate Topic.” NSD Update. November 16, 2018.

A Statistical Analysis Of Side-Bias On The 2018 November-December Lincoln-Douglas Debate Topic

Shah, Sachin. “A Statistical Analysis Of Side-Bias On The 2019 January-February Lincoln-Douglas Debate Topic.” NSD Update. February 16, 2019. http://nsdupdate.com/2019/a-statistical-analysis-of-side-bias-on-the-2019-january-february-lincoln-douglas-debate-topic/.

[2] 2019 September October side-bias study

Shah, Sachin. “A Statistical Analysis of Side-Bias on the 2019 September-October Lincoln-Douglas Debate Topic by Sachin Shah.” NSD Update. October 10, 2019. http://nsdupdate.com/2019/a-statistical-analysis-of-side-bias-on-the-2019-september-october-lincoln-douglas-debate-topic-by-sachin-shah/

[3] 0.05 is a common alpha level for studies, with 0.1 being the upper limit.

Lavrakas, P. J. (2008). Encyclopedia of survey research methods Thousand Oaks, CA: Sage Publications, Inc. doi: 10.4135/9781412963947

[4] It is important to note that numbers presented in this article that use the 2019 November-December data set should only be used within the context of the 2019 November-December topic; debaters who attempt to extrapolate this data to future topics would be misrepresenting the intent of this article.

[5] The data set and analysis that utilizes 2017-2019 tournaments could be extrapolated to future topics as it suggests a trend.

Grant Brown