From Life in the Server to Life in the Cluster

Life in early 2011:

  • Work around a server, one process, Gigabyte datasets.

Life in early 2012:

  • Work around a cluster, many processes, Terabyte datasets.

I remember the old days, when I had to pipette to run an experiment. Today I do not have to pipette, I run a command or pipeline in a computer terminal connecting remotely to a cluster of a few thousand nodes. Sometimes it might be quicker to run a PCR than running my workflow script.

I consider a privilege being “drown” in data. Why? Because this is the future. More data brings more hypotheses and more hypotheses bring more knowledge. One either learns to surf the waves or a tsunami ends up catching one soon enough.

How does it feel from the inside? It feels exciting, overwhelmingly exhilarating! It feels like wanting to surf in a sea of data yet happy to be able to barely keep afloat: this is the inevitable fate of those genome bioinformaticians dealing with Next Generation Sequencing data.

What next in my todo list?

Cloud computing. I am counting the days when my experiments will be run in the cloud, not the cluster.

By Sam Johnston (CC BY-SA 3.0 license)

I look forward to welcoming you to the data feast. Will you join?

Leave a Reply