The fine folks at Thrasher liked my post about their magazine covers, so they sent me some sweet gear:

I'm definitely pumped right now. Thanks guys!

I recently became slightly annoyed with the information being displayed by PBS's *qstat* command. My main issue was that a simple *qstat* tends to cut off job names, which are very important if you're running multiple jobs with long, similar names that can't be distinguished when trimmed. The other extreme, *qstat -f*, prints way too much information that's difficult to efficiently navigate through.

There's probably an option flag that's midway between the two, but it seemed like a fun idea to write a simple intercepting script that only printed a couple things I found useful.

First, here's the first few lines of one job from the output of *qstat -f *to give you an idea of what the script is working with:

Job Id: 54314.master.localdomain Job_Name = df-AC6hex-N2-h2-HSE1PBE-opt-gdv Job_Owner = bw@master.localdomain resources_used.cput = 113:03:48 resources_used.mem = 3177372kb resources_used.vmem = 4856612kb resources_used.walltime = 118:20:42 job_state = R queue = verylong ...

In the output, each job is separated by a blank line. So, here's a python script that strips away some of the unneeded info, while printing the full job name:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | #! /usr/bin/python import subprocess # get user name user = subprocess.check_output(['whoami']).strip() # get all jobs data out = subprocess.check_output(['qstat','-f']) lines = out.split('\n') # build list of jobs, each job is a dictionary jobs = [] for line in lines: if "Job Id:" in line: # new job job = {} s = line.split(":") job_id = s[1].split('.')[0].strip() job[s[0].strip()] = job_id if '=' in line: s = line.split("=") job[s[0].strip()] = s[1].strip() elif line == '': jobs.append(job) # print out useful information about user's jobs print "\n " + user + "'s jobs:\n" for job in jobs: if job['Job_Owner'].split('@')[0] == user: print " " + job['Job_Name'] print " Id: " + job['Job Id'] print " Wall time: " + job['resources_used.walltime'] print " State: " + job['job_state'] print |

Snippet of example output:

bw's jobs: df-AC6hex-N2-h2-HSE1PBE-opt-gdv Id: 54314 Wall time: 118:20:42 State: R df-AC6hex-N2-h1b-HSE1PBE-opt-gdv Id: 54317 Wall time: 118:13:38 State: R df-AC6hex-N2-h2b-HSE1PBE-opt-gdv Id: 54321 Wall time: 118:13:39 State: R ...

The output of the command *qstat -f* is captured by python via the *subprocess.check_output()* function and organized into a dictionary, which allows for easy customization of what's printed out. After that, it's just some basic string processing and printing. Note also that it only prints information about the jobs of the user who is running the script.

I started learning a bit of R recently at a data processing workshop. I'm enjoying it so far. Sometimes it's nice to see what else is "out there" (as in, not python).

One interesting thing is the fact that many arithmetic operations on vectors are treated element-wise. So for example, adding two vectors with "+" just adds the corresponding elements. This can be useful for generating some quick function plots in very few lines and with no need for explicit loops or anything like that.

Here's a simple plot of the Lennard-Jones potential as you would create it in an interactive R session:

1 2 3 | x <- seq(0,2.5,0.01) # generate grid from 0 to 2.5 in steps of 0.01 y <- x^(-12) - x^(-6) # evaluate Lennard-Jones potential on all grid points plot(x,y,type="l") # plot with a line |

Of course, you can get fancier and create beautiful plots with R, but it's also nice for quick-and-dirty visualizations.

*Sorry, this is not a post about British or Canadian sketch comedy shows. It's about a brute force simulation of the classic Monty Hall problem using python. There are already a bunch of these online, but I was so disappointed in the lack of title puns that I decided to make my own. Many of the existing simulations are quite good and fancy, so I like to classify mine as being more "back of the envelope" python.*

Of all the problems that demonstrate the breakdown of statistical intution, the Monty Hall problem is certainly one of the most famous examples. There is already *tons* of information about it available online, so I won't go into much detail here. For completeness, I'll shamelessly copy a simple variation of the problem statement from the wikipedia page (which I recommend you take a look at):

*Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host *(Monty Hall)*, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?*

If this is your first time hearing the problem, think about it before you move on to the solution! The answer follows this large picture of a goat which serves as a spoiler blocker:

**Solution:**

You should switch! Switching gives you a 2/3 chance of getting the car. This may (i.e. was most definitely probably) not your first intuitive conclusion, but it's the correct one. Again, the wikipedia page provides many different ways to understand the solution, but the easiest way for me to think about it is this: *You should switch because you were more likely to have picked a goat (2/3) in your initial guess.*

So, let's play this game *n* times in python and see if this conclusion holds.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | import random #number of games to play n = 100000 #number of correct guesses on first guess correct = 0 # run simulations for i in range(n): # place car behind random door doors = ['car','goat','goat'] random.shuffle(doors) # contestant guesses guess_door = random.randint(0,2) guess = doors.pop(guess_door) # Monty Hall reveals one of the doors that he knows contains a goat. doors.remove('goat') # last unopened door remaining_door = doors[0] # check if you are correct on your first guess if guess == 'car': correct += 1 # correct percentage from staying on original door per = str(100.0*correct/n) print 'You are correct %s%% of the time when sticking with your first guess\n' % per |

After 100,000 games:

You are correct 33.309% of the time when sticking with your first guess

A 33.33..% chance is 1/3, and so that means you have 1-1/3 = 2/3 odds of getting the car *when switching*.

A sprinkle of *print* statements in the code (here) can help convince you that it's actually doing what the problem describes. Here's a run of 5 games, printing out things at each step:

cycle 0 doors: ['car', 'goat', 'goat'] first guess: 2 (goat) remaining doors after guess: ['car', 'goat'] remaining door after revealing goat: ['car'] Wrong! cycle 1 doors: ['goat', 'car', 'goat'] first guess: 2 (goat) remaining doors after guess: ['goat', 'car'] remaining door after revealing goat: ['car'] Wrong! cycle 2 doors: ['car', 'goat', 'goat'] first guess: 0 (car) remaining doors after guess: ['goat', 'goat'] remaining door after revealing goat: ['goat'] Correct! cycle 3 doors: ['car', 'goat', 'goat'] first guess: 1 (goat) remaining doors after guess: ['car', 'goat'] remaining door after revealing goat: ['car'] Wrong! cycle 4 doors: ['goat', 'goat', 'car'] first guess: 2 (car) remaining doors after guess: ['goat', 'goat'] remaining door after revealing goat: ['goat'] Correct! You are correct 40% of the time when sticking with your first guess

Aside: 40% isn't very close to 1/3, but that's because I only ran the simulation for 5 cycles in this case. When the code is run with a larger *n* (like above with 100,000), the result is much closer 1/3.

Now on to the good stuff - with the problem explicitly laid out via python code, it's now very easy to see that lines 18-22 are *completely* irrelevant to the final calculation. This is surprising, but true: **Given the problem details, the fact that Monty Hall eliminates a goat door does nothing to change the fact that you had only a 33.33...% chance of getting the car on your first guess.** Since you were more likely to have guessed a goat on your first guess, the other remaing door more likely contains the car, and so you should switch.