What I Use

Things change quickly, particularly in bioinformatics, so I expect to write more than one of these posts in the future. As a snapshot of what I’m currently using in my day-to-day work:

Hardware:
PC: generic Dell desktop with Windows 7. Getting rid of Windows XP was a massive boost to my productivity
Primary server: Richter – a Dell PowerEdge R610 with 2x Intel Xeon X5650 processors and 48GB of RAM, running Ubuntu Linux 12.04
HPC: Dalek, Judoon and Ood. Respectively, two near identical SGI Altix 8400 clusters (32 nodes each, each identical to Richter with twice as much RAM) and an SGI Altix UV1000 with 1TB of shared RAM

I have to admit, I’m not short on compute, but space for data and analysis is much harder to come by.

Software:
Desktop –Microsoft Office for general computing.
Servers – In terms of real work, I do my best to seek out the best open-source solutions for my problems, or write my own solutions (see below). Pretty much all bioinformatics is comprised of “edge case” situations, where the edge in question is either very niche applications or problems that have never been solved before. The pieces of software that spring to mind in terms of sequencing would be the aligners I depend on (Stampy + BWA in hybrid mode), middleware like Picard and Samtools and the Broad Institute’s phenomenal Genome Analysis Tool Kit (GATK). More on this another day.

Code:
Writing code is essential to every bioinformatician, even if you’re just writing wrappers and glue to stick pre-existing programs and code together. I’m fickle, but currently keen on:

Perl – good for scripting and anything moderately complex, it’s a good workhorse, even if it’s unforgivably ugly. I’ll rather write in Python, but I’m not as strong in it and don’t yet have a compelling reason to switch.
Bash – we all love to shell script. The ease and elegance of piping the input and output of programs into one another on a Unix command line is phenomenally useful and powerful.
SQL – if you’ve got a lot of data, sooner or later you’ll need to whack it in a database and realistically, you have no other choice than a MySQL one. A friend of mine pointed out that a single well-written MySQL can be more satisfying than hundreds of lines of far more complex code. On the other hand, all the kludge that comes with MySQL is as unwelcome as it is unwieldy and ugly (particularly the Perl DBI. Yuck).
R – I’m pushing more and more of my work through R. I think this is because I love figures and working visually (i.e. interpreting data sets by creating plots and other images) and R has such powerful visualisation tools built right in (base graphics) or just a package download away (ggplot2). R is simultaneously beautifully simple and heinously broken, but you must love things because of their flaws, not in spite of them. The three primary pieces of advice I can offer new starters would be (in order of importance):

  1. Don’t give up. It’s hard, but nothing worth doing is easy
  2. Use the excellent RStudio IDE
  3. Read the R Inferno

If you’re not someone who used the 32 bit Windows version on a puny machine (no context-sensitive highlighting, no tab-completion, no easy list of variable types and dimensions) for some years and constantly ran into endless “cannot allocate vector of size…” error messages when you run out of memory, you probably can’t appreciate the joy of working with the 64 bit version on a server with dozens of GB of RAM.