Thanks Thrasher!

The fine folks at Thrasher liked my post about their magazine covers, so they sent me some sweet gear:

I'm definitely pumped right now. Thanks guys!

Custom PBS qstat output

I recently became slightly annoyed with the information being displayed by PBS's qstat command. My main issue was that a simple qstat tends to cut off job names, which are very important if you're running multiple jobs with long, similar names that can't be distinguished when trimmed. The other extreme, qstat -f, prints way too much information that's difficult to efficiently navigate through.

There's probably an option flag that's midway between the two, but it seemed like a fun idea to write a simple intercepting script that only printed a couple things I found useful.

First, here's the first few lines of one job from the output of qstat -f to give you an idea of what the script is working with:

Job Id: 54314.master.localdomain
    Job_Name = df-AC6hex-N2-h2-HSE1PBE-opt-gdv
    Job_Owner = bw@master.localdomain
    resources_used.cput = 113:03:48
    resources_used.mem = 3177372kb
    resources_used.vmem = 4856612kb
    resources_used.walltime = 118:20:42
    job_state = R
    queue = verylong
    ...

In the output, each job is separated by a blank line. So, here's a python script that strips away some of the unneeded info, while printing the full job name:

#! /usr/bin/python
 
import subprocess
 
# get user name
user = subprocess.check_output(['whoami']).strip()
# get all jobs data
out = subprocess.check_output(['qstat','-f'])
lines = out.split('\n')
 
# build list of jobs, each job is a dictionary
jobs = []
for line in lines:
    if "Job Id:" in line:  # new job
        job = {}
        s = line.split(":")
        job_id = s[1].split('.')[0].strip()
        job[s[0].strip()] = job_id
    if '=' in line:
        s = line.split("=")
        job[s[0].strip()] = s[1].strip()
    elif line == '':
        jobs.append(job)
 
# print out useful information about user's jobs
print "\n   " + user + "'s jobs:\n"
for job in jobs:
    if job['Job_Owner'].split('@')[0] == user:        
        print "   " +  job['Job_Name']
        print "   Id: " + job['Job Id']
        print "   Wall time: " + job['resources_used.walltime']
        print "   State: " + job['job_state']
        print

Snippet of example output:

   bw's jobs:

   df-AC6hex-N2-h2-HSE1PBE-opt-gdv
   Id: 54314
   Wall time: 118:20:42
   State: R

   df-AC6hex-N2-h1b-HSE1PBE-opt-gdv
   Id: 54317
   Wall time: 118:13:38
   State: R

   df-AC6hex-N2-h2b-HSE1PBE-opt-gdv
   Id: 54321
   Wall time: 118:13:39
   State: R

   ...

The output of the command qstat -f is captured by python via the subprocess.check_output() function and organized into a dictionary, which allows for easy customization of what's printed out. After that, it's just some basic string processing and printing. Note also that it only prints information about the jobs of the user who is running the script.

Lennard R. Jones

I started learning a bit of R recently at a data processing workshop. I'm enjoying it so far. Sometimes it's nice to see what else is "out there" (as in, not python).

One interesting thing is the fact that many arithmetic operations on vectors are treated element-wise. So for example, adding two vectors with "+" just adds the corresponding elements. This can be useful for generating some quick function plots in very few lines and with no need for explicit loops or anything like that.

Here's a simple plot of the Lennard-Jones potential as you would create it in an interactive R session:

1
2
3

x <- seq(0,2.5,0.01)    # generate grid from 0 to 2.5 in steps of 0.01
y <- x^(-12) - x^(-6)   # evaluate Lennard-Jones potential on all grid points
plot(x,y,type="l")      # plot with a line

Of course, you can get fancier and create beautiful plots with R, but it's also nice for quick-and-dirty visualizations.

Monty Python: Kids in the Hall

Sorry, this is not a post about British or Canadian sketch comedy shows. It's about a brute force simulation of the classic Monty Hall problem using python. There are already a bunch of these online, but I was so disappointed in the lack of title puns that I decided to make my own. Many of the existing simulations are quite good and fancy, so I like to classify mine as being more "back of the envelope" python.

Of all the problems that demonstrate the breakdown of statistical intution, the Monty Hall problem is certainly one of the most famous examples. There is already tons of information about it available online, so I won't go into much detail here. For completeness, I'll shamelessly copy a simple variation of the problem statement from the wikipedia page (which I recommend you take a look at):

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host (Monty Hall), who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

If this is your first time hearing the problem, think about it before you move on to the solution! The answer follows this large picture of a goat which serves as a spoiler blocker:

Solution:

You should switch! Switching gives you a 2/3 chance of getting the car. This may (i.e. was most definitely probably) not your first intuitive conclusion, but it's the correct one. Again, the wikipedia page provides many different ways to understand the solution, but the easiest way for me to think about it is this: You should switch because you were more likely to have picked a goat (2/3) in your initial guess.

So, let's play this game n times in python and see if this conclusion holds.

import random
 
#number of games to play
n = 100000
#number of correct guesses on first guess
correct = 0
 
# run simulations
for i in range(n):
    # place car behind random door
    doors = ['car','goat','goat']
    random.shuffle(doors)
 
    # contestant guesses
    guess_door = random.randint(0,2)
    guess = doors.pop(guess_door)
 
    # Monty Hall reveals one of the doors that he knows contains a goat.
    doors.remove('goat')
 
    # last unopened door
    remaining_door = doors[0]
 
    # check if you are correct on your first guess
    if guess == 'car':
        correct += 1
 
# correct percentage from staying on original door
per = str(100.0*correct/n)
print 'You are correct %s%% of the time when sticking with your first guess\n' % per

After 100,000 games:

You are correct 33.309% of the time when sticking with your first guess

A 33.33..% chance is 1/3, and so that means you have 1-1/3 = 2/3 odds of getting the car when switching.

A sprinkle of print statements in the code (here) can help convince you that it's actually doing what the problem describes. Here's a run of 5 games, printing out things at each step:

cycle 0
doors: ['car', 'goat', 'goat']
first guess: 2 (goat)
remaining doors after guess: ['car', 'goat']
remaining door after revealing goat: ['car']
Wrong!

cycle 1
doors: ['goat', 'car', 'goat']
first guess: 2 (goat)
remaining doors after guess: ['goat', 'car']
remaining door after revealing goat: ['car']
Wrong!

cycle 2
doors: ['car', 'goat', 'goat']
first guess: 0 (car)
remaining doors after guess: ['goat', 'goat']
remaining door after revealing goat: ['goat']
Correct!

cycle 3
doors: ['car', 'goat', 'goat']
first guess: 1 (goat)
remaining doors after guess: ['car', 'goat']
remaining door after revealing goat: ['car']
Wrong!

cycle 4
doors: ['goat', 'goat', 'car']
first guess: 2 (car)
remaining doors after guess: ['goat', 'goat']
remaining door after revealing goat: ['goat']
Correct!

You are correct 40% of the time when sticking with your first guess

Aside: 40% isn't very close to 1/3, but that's because I only ran the simulation for 5 cycles in this case. When the code is run with a larger n (like above with 100,000), the result is much closer 1/3.

Now on to the good stuff - with the problem explicitly laid out via python code, it's now very easy to see that lines 18-22 are completely irrelevant to the final calculation. This is surprising, but true: Given the problem details, the fact that Monty Hall eliminates a goat door does nothing to change the fact that you had only a 33.33...% chance of getting the car on your first guess. Since you were more likely to have guessed a goat on your first guess, the other remaing door more likely contains the car, and so you should switch.

electron exchange

holding the universe together

July 2013

Thanks Thrasher!

Custom PBS qstat output

Lennard R. Jones

Monty Python: Kids in the Hall