Science or Fiction Prediction: Getting a Statistical Edge

People sometimes say that if you're not sure what the answer is on a multiple choice question, you should guess c.  I've always wondered if such a system could be applied to the Science or Fiction section on The Skeptic's guide to the Universe (SGU) podcast.

What a multiple choice test may look like

What a multiple choice test may look like

Quick background for non-listeners:  The Skeptics Guide to the Universe is a super great science podcast that you should listen to.  Each episode, they play a game called Science or Fiction, where one host (usually Steve) reads science news items or facts, one of which is completely made up.  The others then try their best to determine which one is the fiction.

While it isn't practical to examine all of the multiple choice tests that have ever existed to determine if c is more likely to be correct, we can actually take a look at each round of Science or Fiction.  It turns out that they keep good show notes on the SGU's website, including each science or fiction item and whether or not it's true.

As of this post, there are 480 episodes, so it's not practical to get the data by hand, but since each episode's page is neatly organized on the website it only took a couple minutes to whip up a little scraping script with python and Beautiful Soup to get the data. (Interestingly enough, scraping through all of the pages I found a tiny mistake: Item #1 of episode 247 is missing a "1".  This broke my scraper the first time through.)

I only collected information about episodes where there were three science or fiction items (which is most of them), so that we can make a meaningful comparison:

Item 1 Item 2 Item 3
Frequency 128 119 133
Probability of Fiction 33.7% 31.3% 35.0%

So it appears that item 2 is fiction less often than items 1 and 3.  The question is, is it a "real" difference, or is it just part of the expected statistical background noise?  Basically, we're trying to empirically determine if Steve uses some sort of random number generator to determine which item will be the fiction each week. Doing a chi squared test tells us that there's a 67% chance of observing such a difference.

In other words: the frequencies are consistent with a uniform distribution, and you can't get a significant edge based on the item ordering.  Steve outsmarts us again!

I did the data collection and analysis with ipython, and you can check out the code here.