Sorry, this is not a post about British or Canadian sketch comedy shows. It's about a brute force simulation of the classic Monty Hall problem using python. There are already a bunch of these online, but I was so disappointed in the lack of title puns that I decided to make my own. Many of the existing simulations are quite good and fancy, so I like to classify mine as being more "back of the envelope" python.
Of all the problems that demonstrate the breakdown of statistical intution, the Monty Hall problem is certainly one of the most famous examples. There is already tons of information about it available online, so I won't go into much detail here. For completeness, I'll shamelessly copy a simple variation of the problem statement from the wikipedia page (which I recommend you take a look at):
Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host (Monty Hall), who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?
If this is your first time hearing the problem, think about it before you move on to the solution! The answer follows this large picture of a goat which serves as a spoiler blocker:
Solution:
You should switch! Switching gives you a 2/3 chance of getting the car. This may (i.e. was most definitely probably) not your first intuitive conclusion, but it's the correct one. Again, the wikipedia page provides many different ways to understand the solution, but the easiest way for me to think about it is this: You should switch because you were more likely to have picked a goat (2/3) in your initial guess.
So, let's play this game n times in python and see if this conclusion holds.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | import random #number of games to play n = 100000 #number of correct guesses on first guess correct = 0 # run simulations for i in range(n): # place car behind random door doors = ['car','goat','goat'] random.shuffle(doors) # contestant guesses guess_door = random.randint(0,2) guess = doors.pop(guess_door) # Monty Hall reveals one of the doors that he knows contains a goat. doors.remove('goat') # last unopened door remaining_door = doors[0] # check if you are correct on your first guess if guess == 'car': correct += 1 # correct percentage from staying on original door per = str(100.0*correct/n) print 'You are correct %s%% of the time when sticking with your first guess\n' % per |
After 100,000 games:
You are correct 33.309% of the time when sticking with your first guess
A 33.33..% chance is 1/3, and so that means you have 1-1/3 = 2/3 odds of getting the car when switching.
A sprinkle of print statements in the code (here) can help convince you that it's actually doing what the problem describes. Here's a run of 5 games, printing out things at each step:
cycle 0 doors: ['car', 'goat', 'goat'] first guess: 2 (goat) remaining doors after guess: ['car', 'goat'] remaining door after revealing goat: ['car'] Wrong! cycle 1 doors: ['goat', 'car', 'goat'] first guess: 2 (goat) remaining doors after guess: ['goat', 'car'] remaining door after revealing goat: ['car'] Wrong! cycle 2 doors: ['car', 'goat', 'goat'] first guess: 0 (car) remaining doors after guess: ['goat', 'goat'] remaining door after revealing goat: ['goat'] Correct! cycle 3 doors: ['car', 'goat', 'goat'] first guess: 1 (goat) remaining doors after guess: ['car', 'goat'] remaining door after revealing goat: ['car'] Wrong! cycle 4 doors: ['goat', 'goat', 'car'] first guess: 2 (car) remaining doors after guess: ['goat', 'goat'] remaining door after revealing goat: ['goat'] Correct! You are correct 40% of the time when sticking with your first guess
Aside: 40% isn't very close to 1/3, but that's because I only ran the simulation for 5 cycles in this case. When the code is run with a larger n (like above with 100,000), the result is much closer 1/3.
Now on to the good stuff - with the problem explicitly laid out via python code, it's now very easy to see that lines 18-22 are completely irrelevant to the final calculation. This is surprising, but true: Given the problem details, the fact that Monty Hall eliminates a goat door does nothing to change the fact that you had only a 33.33...% chance of getting the car on your first guess. Since you were more likely to have guessed a goat on your first guess, the other remaing door more likely contains the car, and so you should switch.