Monty Python: Kids in the Hall

Sorry, this is not a post about British or Canadian sketch comedy shows.  It's about a brute force simulation of the classic Monty Hall problem using python.  There are already a bunch of these online, but I was so disappointed in the lack of title puns that I decided to make my own.  Many of the existing simulations are quite good and fancy, so I like to classify mine as being more "back of the envelope" python.

Of all the problems that demonstrate the breakdown of statistical intution, the Monty Hall problem is certainly one of the most famous examples. There is already tons of information about it available online, so I won't go into much detail here.  For completeness, I'll shamelessly copy a simple variation of the problem statement from the wikipedia page (which I recommend you take a look at):

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host (Monty Hall), who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

If this is your first time hearing the problem, think about it before you move on to the solution!  The answer follows this large picture of a goat which serves as a spoiler blocker:

 

Solution:

You should switch!  Switching gives you a 2/3 chance of getting the car.  This may (i.e. was most definitely probably) not your first intuitive conclusion, but it's the correct one.  Again, the wikipedia page provides many different ways to understand the solution, but the easiest way for me to think about it is this:  You should switch because you were more likely to have picked a goat (2/3) in your initial guess.

So, let's play this game n times in python and see if this conclusion holds.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import random
 
#number of games to play
n = 100000
#number of correct guesses on first guess
correct = 0
 
# run simulations
for i in range(n):
    # place car behind random door
    doors = ['car','goat','goat']
    random.shuffle(doors)
 
    # contestant guesses
    guess_door = random.randint(0,2)
    guess = doors.pop(guess_door)
 
    # Monty Hall reveals one of the doors that he knows contains a goat.
    doors.remove('goat')
 
    # last unopened door
    remaining_door = doors[0]
 
    # check if you are correct on your first guess
    if guess == 'car':
        correct += 1
 
# correct percentage from staying on original door
per = str(100.0*correct/n)
print 'You are correct %s%% of the time when sticking with your first guess\n' % per

After 100,000 games:

You are correct 33.309% of the time when sticking with your first guess

A 33.33..% chance is 1/3, and so that means you have 1-1/3 = 2/3 odds of getting the car when switching.

A sprinkle of print statements in the code (here) can help convince you that it's actually doing what the problem describes.  Here's a run of 5 games, printing out things at each step:

cycle 0
doors: ['car', 'goat', 'goat']
first guess: 2 (goat)
remaining doors after guess: ['car', 'goat']
remaining door after revealing goat: ['car']
Wrong!

cycle 1
doors: ['goat', 'car', 'goat']
first guess: 2 (goat)
remaining doors after guess: ['goat', 'car']
remaining door after revealing goat: ['car']
Wrong!

cycle 2
doors: ['car', 'goat', 'goat']
first guess: 0 (car)
remaining doors after guess: ['goat', 'goat']
remaining door after revealing goat: ['goat']
Correct!

cycle 3
doors: ['car', 'goat', 'goat']
first guess: 1 (goat)
remaining doors after guess: ['car', 'goat']
remaining door after revealing goat: ['car']
Wrong!

cycle 4
doors: ['goat', 'goat', 'car']
first guess: 2 (car)
remaining doors after guess: ['goat', 'goat']
remaining door after revealing goat: ['goat']
Correct!

You are correct 40% of the time when sticking with your first guess

Aside:  40% isn't very close to 1/3, but that's because I only ran the simulation for 5 cycles in this case.  When the code is run with a larger n (like above with 100,000), the result is much closer 1/3.

Now on to the good stuff - with the problem explicitly laid out via python code, it's now very easy to see that lines 18-22 are completely irrelevant to the final calculation. This is surprising, but true: Given the problem details, the fact that Monty Hall eliminates a goat door does nothing to change the fact that you had only a 33.33...% chance of getting the car on your first guess. Since you were more likely to have guessed a goat on your first guess, the other remaing door more likely contains the car, and so you should switch.

wools++: Major update!

I've totally revamped wools++.  Major changes include:

I have ideas for plenty of other features as well, and I'd be happy to take further suggestions.

Related, Overcast Network recently updated their stats system as well, providing statistics breakdowns based on game mode and playing time which is really cool (except, oh god, now I can easily see how much I play each day).  I plan on incorporating some of this new information into wools++ in the future.

What do skateboarders talk about?

It turns out, mostly skateboarding, videos, teams, and people named Mike.

To find out, I analyzed over 50 interviews from the wonderful skateboarding blog the chrome ball incident.  I did this by scraping the content from the site (with python, of course) and then filtering out some common words (stop words) to get meaningful results.  I then used Wordle (which is really, really cool) to make tag clouds of the 150 most frequently used words in all of the interviews.

The total word count was a whopping 229,199 words (for comparison).  The first round of filtering was pretty modest - I only removed the 186 most commonly used English words.  Here are the results (click the photo to embiggen):

skate_text1_wm
Including a slightly larger list of stop words reveals a bit more character:

skate_text2_wm

Amongst the expletives and common skate-slang ('dude', 'stoked', 'rad', 'spot', etc.), the word 'street' (as in, street skating) is mentioned 3 times as frequently as 'vert' (as in, vert skating) - something we've seen before.

You can see the raw data here and here for each stop word filtering.  Play with it please!

New project: wools++

I've taken on another minecraft related project in an attempt to learn some more python as well as some basic web development.

 

Introducing wools++

Centered on the Overcast Network (formerly Project Ares) collection of minecraft servers, wools++ collects data from a user's profile a few times per day and produces more sophisticated statistics (such as rolling values) and even some time series plots.  Here's a couple from my profile:

kd
rkd

 

With:

  • KD = kills/deaths
  • RK7 = rolling kills (kills over the last 7 days)
  • RD7 = rolling deaths
  • RKD7 = rolling KD

The name itself comes from a capture-the-flag type game often played on the servers, where the flag is replaced by a minecraft wool block.  If you successfully capture a wool block, your "wools" count is incremented.

The project is hosted on Google's app engine for a couple reasons.

  • It can be free (if your app is small enough)
  • Built in datastore (no need to worry about setting up my own SQL database or anything)
  • Python support.  This is great because python is generally a super useful language to know, particularly in the computational sciences.

In the future I'll definitely spend some time discussing the process I went through building this app, because in some cases Google's documentation was a little bit light on the details.  For now, check out the about page if you'd like to read about more details, and you can find the source here.