From Life in the Server to Life in the Cluster

March 12th, 2012 § Leave a Comment

Life in early 2011:

  • Work around a server, one process, Gigabyte datasets.

Life in early 2012:

  • Work around a cluster, many processes, Terabyte datasets.

I remember the old days, when I had to pipette to run an experiment. Today I do not have to pipette, I run a command or pipeline in a computer terminal connecting remotely to a cluster of a few thousand nodes. Sometimes it might be quicker to run a PCR than running my workflow script.

I consider a privilege being “drown” in data. Why? Because this is the future. More data brings more hypotheses and more hypotheses bring more knowledge. One either learns to surf the waves or a tsunami ends up catching one soon enough.

How does it feel from the inside? It feels exciting, overwhelmingly exhilarating! It feels like wanting to surf in a sea of data yet happy to be able to barely keep afloat: this is the inevitable fate of those genome bioinformaticians dealing with Next Generation Sequencing data.

What next in my todo list?

Cloud computing. I am counting the days when my experiments will be run in the cloud, not the cluster.

By Sam Johnston (CC BY-SA 3.0 license)

I look forward to welcoming you to the data feast. Will you join?

Bye Sanger, Hello TGAC

February 14th, 2012 § Leave a Comment

After 3.5 exciting years at the Wellcome Trust Sanger Institute, working as senior developer for the DECIPHER database, it was time to start a new venture. As of February 13th, I am the Project Leader of Plant and Animal Genomes at The Genome Analysis Centre (TGAC). TGAC is specialized in the study of plants, microbes and animal genomes with the view to facilitate the development of new genomics-based biology in academic and commercial sectors.

In my new role at TGAC I will be leading the group of computational biologists working on plant and animal genome analyses within the Genome Analysis Team. Our aims will include the organization of the analysis of sequence generated at TGAC, engaging with internal and external collaborators, nationally and internationally. A lot of the work will focus on (but will not solely be restricted to) the analysis of RNA-seq, ChIP-seq and Bisulphite sequencing data for the purposes of understanding how genes are regulated.

Coming Opportunities

I will be expecting to have openings in my group in the near future for student projects (Masters and PhD) as well as research associates and technicians. Meanwhile, if you are interested in joining or simply discussing ideas or potential projects in the the broad areas of transcriptomics, epigenomics and gene regulation, you are always welcome to drop me a line.

A Personal View of Personal Genomics

January 30th, 2012 § Leave a Comment

In this interview I talk about my experiences in analyzing my genome and the genomes of my family. Here I also introduce the motivation and future plans for the analysis of my personal exome (i.e. all coding regions in my genome).

Some further information regarding my family experience of personal genomics can be found in a recent publication.

Genomic Technologies in the Clinic: Challenges and Opportunities

October 9th, 2011 § Leave a Comment

Next Generation Sequencing (NGS) offers the promise of revolutionizing our ability to diagnose genetic disorders. Fuelled by the exponential decrease in the cost of sequencing, NGS can now be outsourced, making it accessible to labs with modest budgets. A personal exome (the sum of all coding regions in a genome) is currently priced at $999 by some providers. Although not as comprehensive as whole genome sequencing, exomes provide the ability to shed light on the origin of causative mutations lying on genes.

Exome Sequencing

                                    (by SarahKusala, CC-BY 3.0)

Getting the raw sequence data is the easy part. The challenging part is to extract and interpret clinically the genomic variation found in the raw data. The extraction of variants from raw NGS data can be influenced by many factors such as the sequence read depth, the alignment of reads and the variant calling algorithm. If one is to find the variants that may be of clinical relevance, filtering is required. This filtering may be performed by comparing genome data against data from the “normal” variation found in the 1000genomes project and dbSNP. Depending on the length of the mutation, there are three main kinds of variants: SNPs, indels and CNVs. SNPs constitute single point mutations (one DNA base), Indels insertions or deletions of up to about 1Kb and CNVs deletions or duplications from 1Kb to many megabases long.

It is well known, however, that many SNPs fall into locations that are far from genes, yet they can cause phenotypic effects. But assuming that one is looking at coding regions, many pieces of software have been developed to predict the effects of SNP mutations: stop codons, missense mutations and frameshifts.

Indels and CNVs are slightly harder to interpret clinically. CNVs can encompass many genes and their phenotypic effect cannot be clearly established unless several patients have been observed with a similar CNV. It is not uncommon for a normal individual to carry hundreds of indels and CNVs.

Challenges

One of the most important challenges in the clinic when implementing genomics is going to be how to deal with the huge amounts of data produced. There is going to be a great number of patients sequenced, all of them producing a huge number of genomic features of unknown significance. Given that in order to confidently interpret a rare variant it is needed to have evidence from several patients, it is not surprising that another big challenge is how this information is going to be shared. A lot more data about a patient means that the chances of personal identification are increased even if this information is anonymous. Thinking about a few routinely carried out tests today, it is possible to uniquely identify a person only with a handful of SNPs. Imagine when one possesses thousands of genomic variants from one patient.

Moreover, if this data is to be shared, a big challenge is going to be how it is going to be compared. Different labs have different Quality Control (QC) standards and different platforms. Each sequencing run may have different read depths and different levels of confidence in terms of whether a called variant is true. Another issue will be how the annotation of phenotypes will be carried out. There are phenotypic ontologies like the Human Phenotype Ontology, that allows a reasonably complete set of clinical descriptions. Nevertheless there is no guarantee that phenotypic descriptions even using the same ontology will have the same level of annotation. All these factors are going to need consideration when interpreting NGS in the clinic.

One of the main hurdles impairing the access of NGS to the clinic can also be the health system in the country. The UK seems to have been able for now to put together many state funded clinical labs to work together. Unfortunately, this would be unthinkable in countries like Spain, where instead of 1 unique health system, there are 17, as many as autonomous regions there are. Sequencing technologies require a lot of different sectors coordinating together in order to set up the appropriate platforms that guarantee the access of the technology, its proper interpretation and the protection of the patient’s privacy.

Opportunities

The other side of the coin is that these technologies are going to become increasingly affordable, not just for the rich countries but also for the emerging. The accessibility of this technology will make it ubiquitous in many labs around the world, not just to those looking for diagnosis of patients with genomic disorders. Expect sequencing routinely performed for cancer tissues and even at birth. Based on current estimates, it is likely that by 2020 there will be hundreds of millions of genomes sequenced.

Conclusion

Sequencing is going to revolutionize clinical practice. The degree to which it will revolutionize it depends on how we harness the challenges described above. There will be technical problems but also institutional ones that are more problematic to solve. The race for harnessing NGS in the clinical setting is on.

Where Am I?

You are currently browsing the Opinion category at Manuel Corpas' Blog.

Follow

Get every new post delivered to your Inbox.