Manuel Corpas' Blog

Genomes, Internet, Bioethics and More…

Why Should I Care About RNA?

I keep bumping into talks, articles and stuff related to the Bioinformatics of RNAs. It certainly looks appealing. I am not sure though what the fuss is about or whether it is as exciting as the DNA field. The promise of the Next Gen Sequencing and its impact in Science, Policy and Society are expected to be significant in the near future. Why should I care about RNAs though?

Here is a list of questions that I throw into the wild hoping to attract the attention of any experts in the field.

  1. Does any recent RNA breakthrough have the same potential to transform Society as DNA research is doing?
  2. Is RNA Bioinformatics an area that is appealing because not a lot of people are working on it? Or has it been fuelled by a new technique that is revolutionising it?
  3. Is the rate of accumulation of RNA data as steep as its DNA cousin?
  4. What are RNA research trends to watch out for?

The debate is open.

Post a comment if you want to say anything.

Filed under: Bioinformatics, Biology , , , ,

A Script to Calculate GC content

Intermediate Perl

GC content is a very interesting property of DNA sequences because it is correlated to repeats and gene deserts. A simple way to calculate GC content is to divide the sum of G and C letters by the total number of nucleotides in the sequence. Let’s assume that you start with a string $sequence.

The WRONG way in which I initially did this was to convert the string to an array of letters, as shown here:

sub calcgc {
 my $seq = $_[0];
 my @seqarray = split('',$seq);
 my $count = 0;
 foreach my $base (@seqarray) {
   $count++ if $base =~ /[G|C]/i;
 }
 my $len = $#seqarray+1;
 my $num=$count / $len;
 my ($dec)=$num =~ /(\S{6})/;
 return $dec;
}

This is a very inefficient way of calculating the GC content, because arrays in Perl are quite expensive in terms of memory. The result of this was that I run out of memory quite quickly.

I found a more efficient approach by using the substr function, looping through the whole sequence, taking one base at a time:

sub calcgc {
 my $seq = $_[0];
 my $count = 0;
 my $len   = length($seq);
 for (my $i = 1;$i<$len+1; $i++) {
   my $base = substr $seq, $i, 1;
 $count++ if $base =~ /[G|C]/i;
 }
 my $num=$count / $len;
 my ($dec)=$num =~ /(\S{6})/;
 return $dec;
}

Now I do not run out of memory and the script promptly calculates the GC content for every sequence I have tried it with. I have not tried it with a whole genome though!

Happy coding!

Filed under: Bioinformatics, Tutorials , , , ,

Array CGH for Dummies

Array-CGH (Comparative Genomic Hybridasation) is becoming a common method used for analysis of patients’ genomes. Array-CGH works by taking a reference genome covering the whole human genome sequence, cutting it into thousands of pieces and orderly attach them to a chip. These pieces are called probes and are usually on the range of 500-2000 DNA bases long. A saliva or blood sample is then taken from the patient and its DNA is also chopped into thousands of pieces in suspension with a solvent. The array is then washed with the suspension containing the patient’s DNA.

DNA is a double chain of nucleotide bases where one chain complements the other. Knowing one chain of the DNA, it is possible to know the other chain. In its natural state, a single DNA chain will tend to bind to its complementary chain. Thus, by washing the patient’s suspension with the array probes will make the patient’s DNA pieces bind to its complementary DNA in the array.

Picture 4Array-CGH can be used to detect whether a patient has a region of the genome missing or duplicated. Probes attached to the chip emit a different color depending on their state of binding. Once the array is washed, most of probe spots will appear yellow, that is, all different probes of the reference genome are bound to the patient’s DNA. If a DNA region is missing in the patient, the complementary spots in the array appear in red. These changes appear in sequential order mapping to the reference genome missing in the patient. Depending on the genes that overlap to the deletion, different symptoms may appear in the patient.

The same happens if the array shows a series of green spots, indicating that a duplicated region of the genome has been found in the DNA of the patient. Because the gene content will be altered in the duplicated region, this may cause disease as a consequence of the over-expression of genes included in the duplicated regions.

Thus, using array techniques, we are now able to find deletions or duplications in the genome of a patient beyond the microscopic level, i.e. changes not directly observable. We are all familiar with the features of a patient with Down’s Syndrome. This syndrome is caused because there is an extra copy of chromosome 21 in the affected patient, due to a duplication of one of the two usual copies (Trisomy 21).

Most of the chromosomal deletions and duplications occur at the molecular level [1], not identifiable with microscopic techniques, as in the case of Down’s Syndrome. Up until recently most of the patients suspected of suffering from genomic diseases, i.e. diseases caused by pathogenic deletions or duplications, went undiagnosed because techniques did not allow detection beyond big chromosomal changes (like whole chromosomes). Techniques such as array-CGH now allow detection of chromosomal changes a thousand times smaller in length.

For a price of about £100 per array one can have one’s genome screened for chromosomal changes. In fact, it seems that most of the genetic changes between any two people (in terms of number of DNA bases) is dependent on the level of micro- deletions and duplications (called Copy Number Variations) [2], just the level we are now starting to handle with current analysis techniques. Next generation sequencing technologies are fast arriving that will allow the base-by-base complete sequencing of the DNA of people at price of $1000 in a short period of time [3].

[1] H.V. Firth, S.M. Richards, A.P. Bevan, S. Clayton, M. Corpas, D. Rajan, S. Van Vooren, Y. Moreau, R.M. Pettett, N.P. Carter (2009). DECIPHER: DatabasE of Chromosomal Imbalance and Phenotype using Ensembl Resources. The American Journal of Human Genetics.

[2] J. R. Lupski (2009). Genomic disorders ten years on.  Genome Medicine

[3] Mardis E.R. (2006). Anticipating the 1,000 dollar genome. Genome Biology


Filed under: Bioinformatics, Tutorials , , , , , , ,

Validating Chromosome Entered is Correct

When you have a web form and one of the fields to be entered is the Chromosome Number, you’d be wise to check that the user does not enter the wrong thing (e.g. ‘0′ , ‘X1′, ‘21.13′). Thus a validation check may save you some headaches.

I set out to find the appropriate regular expression in Javascript that returns an error when the chromosome entered is not 1-22 or X or Y. It turns out that for me it wasn’t that easy to solve this little problem. I searched in Google to see if anyone had posted the solution to the problem and found nothing meaningful.

Therefore I paste below the solution I wrote. I must admit that it is rather ugly the fact that I have to use two if expressions. I would appreciate if any  reader posted a more elegant solution. But for now, this code seems to work fine.

function notValidChr(field)
{
  // Field may start with a number or optionally two
  if ( field.value.match(/^\d\d?$/)
      // Is it an integer greated than 0?
      && (0 < parseInt(field.value)
            // Is it an integer smaller than 23?
            && parseInt(field.value) <23
          )
      )
  {
    // If all conditions above are met, notValidChr is not true
    return false;
  }
  // Check if it is a valid X or Y chromosome
  if (field.value.match(/^[XY]$/)) {
    // If so notValidChr is not true
    return false;
  }
  // Yes! the text entered is not a valid chromosome
  return true;
}

Filed under: Bioinformatics, Tutorials

Tips for Remapping from NCBI36 to NCBI37 Genome Assembly

It might seem for some people straight forward but I had to spend quite some time trying to understand how to remap my array probes from ncbi36 to ncbi37. If you use the Ensembl genome browser, you might have noticed that from July 2009 the ncbi37 assembly is now in use. For DECIPHER (the database I help develop), this is a little bit of a headache, because it means that all of the probes from array CGH that we used have to be remapped to the new assembly. If this does not interest you I recommend that you stop reading here.

First I learned that there is a program called liftOver by UCSC that is able to do this remapping. Since the amount of probes I have to map (around 6 million) is a number that I would not wish to through to anyone’s server, I decided to do this in-house. You can download this program from here. I did not know which was the right binary for me to download, as they had linux32 and linux64 versions. I decided to go for the former, since I am using debian and it sounds like a conservative option.

Once I downloaded the program, I needed to make it executable:

chmod u+x liftOver

OK, so I was in a position to run it:

./liftOver

In the usage information it appears that I need several arguments and files to be able to run this program correctly:

liftOver oldFile map.chain newFile unMapped

Now I learned that I need also to get a file called the map.chain. I was not sure what it meant. I learned that this map.chain file has parameters that are used by liftOver and that there are map.chain files depending on the remapping one wants to do. In my case, I want to remap from ncbi36 to ncbi37 in human. However, when I look at the different remappings, I do not see ncbi formats anywhere. I learned here that what I am looking for is map chain file that is called this:

hg18toHg19.over.chain

Apparently hg18 refers to ncbi36 and hg19 to ncbi37. Doing a google search I could find that file here.

Now I get quite a few options and learn that I need to have my probes in bed format to run liftOver. Apparently there are quite a few formats I can use according to UCSC FAQs formats. Here an example of what my bed file looks like (chromosome-tab-start_position-tab-end_position):

chrY       12308579        12468100
chrY       12468101        12581699
chrY       12581700        12759636
chrY       12759637        12838587

Now I am in a position to run liftOver. I notice now that in the usage one has the following description:

liftOver oldFile map.chain newFile unMapped

‘newFile’ and ‘unMapped’ are the names of the files where the output goes into and therefore are empty. This can be confusing as the user might think that these are some other kind of files one has to get hold of.

OK, so now I am ready to transform our old array probe mapping ncbi36 to the new ncbi37 one:

./liftOver probes.ncbi36 hg18toHg19.over.chain probes.ncbi37 unmapped-to-ncbi37

I got the following output to console:

Reading liftover chains
Mapping coordinates
ERROR: start coordinate is after end coordinate (chromStart > chromEnd) on line 5171240 of bed file probes.decipher.ncbi36
ERROR: 4 2515512 2515453

…which is a bit worrying.

I’ve gone through my probes and found that some of them (just 44757!) had start point coordinates greater than their ends. I guess that if you encounter those you’ll have to decide what to do. For the time being I just took them out and re run liftOver again.

This time it worked.

Filed under: Bioinformatics, Tutorials , , , , , , ,

How to be a Biohacker

Biohackers embrace fully the philosophy of hackers: love for freedom, veneration of competence and utter curiosity for how things work. How does one become a biohacker? Usually biohackers cannot tell if they are really one of them until someone else says so. However, it is not enough to be competent in the mastery of programming or being a computer wiz. You need IT skills that suit computational biology research and familiarity with the biology itself, which in the end is the problem one has to solve.

A big attitude to the biohacker philosophy is that you do not only need love to solve technical problems for their own sake; you need to think of living organisms as an extension of the information systems you work with. Biological concepts may be then abstracted into objects whose hierarchical organization reflect the different levels of order in living things. Computer languages thus become the perfect analogy for understanding the complex information flows in living systems.

True to hackerdom culture, Unix, Perl and MySQL are programming skills that you need to master (I can think of people who would also say Java, Javascript, CSS, etc.). The best way to master the art of programming is to spend as much time as possible reading and writing source code. Some people think Perl is doomed. This is not true in the biohackers world. In part due to legacy and in part to the flexibility it provides, Perl is still the language of choice for many biohackers. Perl is used to construct 1) the back end of web applications, 2) pipelines and workflows and 3) quick and dirty scripts for parsing and calling other programs.

You will also need to be familiar with projects like R and Bioconductor, since a lot of the work will involve providing the computational infrastructure for analyzing data. In addition, you’ll need to know about data formats (fasta, sbml, mmcif…), software toolkits and libraries (Paup, Phylip, EMBOSS, BioPerl…), databases (Ensembl, InterPro, PDB, KEGG…), webservers and portals (Pubmed, ISCB).

Finally keep in mind best practices. Some of them I wrote about in a previous post (like refraining from reinventing the wheel), but above all, give yourself the time to enjoy the learning process. Getting to the top usually takes longer than staying at the top; so what’s the point if you haven’t enjoyed the trip?

Filed under: Bioinformatics , , , , ,

EMBnet comes to Major Bioinformatics African Conference

Arun Gupta, Nicola Mulder, Manuel Corpas

Bioinformatics is a relatively affordable scientific discipline to establish as it requires intellectual capacity but not expensive laboratory facilities or equipment. This makes it a very accessible discipline to scientists in poorly resourced countries in Africa. The International Society for Computational Biology (ISCB) and the African Society for Bioinformatics and Computational Biology (ASBCB) have teamed up to organize a major meeting in Africa in 2009 focused on the theme “Bioinformatics of Infectious Diseases: Pathogens, Hosts and Vectors”. This meeting, a new venture between ISCB and ASBCB and a follow on from a previous successful meeting held in Nairobi by the ASBCB, will be held this year in Bamako (Mali), hosted by the prestigious Malaria Research and Training Center, an important facility for malaria research in Africa. Although it will have a particular African focus, the meeting is intended to be a fully-fledged international event, encompassing scientists and students from leading institutions in the US, Latin America, Europe and Africa. By holding this event in Africa, we intend to stimulate local efforts for cooperation and dissemination of leading research techniques to combat major African diseases.

Program

The meeting will consist of a 4-day conference followed by 2 days of practical workshops. The first 3 days of the meeting will include keynote presentations by 6 invited speakers from around the world. The last day of the conference will be a dedicated KAUST (King Abdullah University of Science and Technology) day focused on the topic “Systems view of biological organisms”. KAUST has secured 20 full fellowships (valued at up to $1,700 each) to cover travel expenses, registration and accommodation for Africans attending the ISCB Africa ASBCB conference. Highly accomplished researchers will present the 2 days of post-conference tutorial workshops. Erik Bongcam-Rudloff, chairman of EMBnet, will give a keynote presentation and a tutorial during a workshop day.

Participation

We expect that most participants will come from Africa, and that the majority will be on the level of junior faculty, PhD students and post-doctoral researchers. Travel fellowships will be awarded to African researchers and students to cover travel and local expenses, with priority given to those selected for oral presentations through peer review of submitted research paper submissions. Several other travel fellowships will be secured for non-African participants. Fellowships will be entirely dependent on the funds available.  Hence, prospective participants are encouraged to seek their own sources of funding.

Submissions

The conference will consist of a single track with oral and poster presentations. There will be a call for submission of abstracts (papers of up to 1000 words will be required for oral presentations); authors will indicate whether they aspire to give an oral or poster presentation. Graduate students, young investigators and all African researchers involved in the field will be strongly encouraged to submit an abstract describing their work. Submissions from all other parts of the globe are invited as well. The Scientific Committee will evaluate submissions, and the (few) selected abstracts from the oral presentation track will be invited for oral presentation, while others will be invited for poster presentation. Using this model we have previously established a powerful cadre of networking scientists – who have demonstrated that by meeting, they can establish networks of collaboration between Africa and the US and EU, as well as between African countries.

Proceedings

The first ASCBCB conference was published in a special issue of Infection, Genetics and Evolution (see: http://dx.doi.org/10.1016/j.meegid.2008.09.002 and http://dx.doi.org/10.1016/j.meegid.2008.09.003). The organizers are in the process of negotiating and securing a publication for the proceedings of this 2009 conference.

Website for further details:

https://www.iscb.org/iscb-africa

About EMBnet

EMBnet is a bioinformatics-based group of collaborating nodes throughout Europe and around the world. Its combined expertise provides services to the molecular biology community.

http://www.embnet.org/

Filed under: Bioinformatics, Conferences

DECIPHER: Shedding Light on Chromosomal Imbalance and Phenotype Interpretation

Presented at the 7th European Cytogenetics Conference, July 4-7 2009, Stockholm, Sweden

Corpas M1, Richards S1, Bevan AP1, Van Vooren S3, Pettett RM1, Firth HV2 ,Carter NP1

1. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, CB10 1SA, UK

2. Cambridge University Dept of Medical Genetics, Addenbrooke’s Hospital, Cambridge CB2 2QQ, UK

3. K.U.Leuven, ESAT/SCD, Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium

Availability: https://decipher.sanger.ac.uk

DECIPHER is a web-based database that stores genomic array data and phenotypes from patients world-wide [1]. DECIPHER is a powerful resource for clinical diagnosis and management of patients with congenital abnormalities, providing tools to help in the interpretation of copy number changes in patients with developmental disorders. Enabled by an international collaborative research consortium, DECIPHER hosts data from more than 2200 patients from over 100 centres in more than 20 countries.

In DECIPHER, molecular rearrangements (as defined by genomic array analysis) are mapped on to the reference sequence in Ensembl. Genes within the affected region are identified and prioritised according to phenotype. Clusters of rearrangements within the same region in patients with comparable phenotypes allow new syndromes and genes involved in human development and disease to be defined [2-4]. Of particular importance, DECIPHER utilises a suite of integrated bioinformatics tools to help to distinguish apparently benign copy number variants from those potentially causing disease.

Key features in DECIPHER for aiding in the interpretation of patient data include:

  • Trio analysis tool – A trio of an affected individual and parents are analysed to determine de novo or familial/inherited conditions.

  • Gene prioritisation tool – advanced text mining searches PubMed for associations between highlighted genes and phenotypes.

  • Search tool – a search engine for data (with previous consent from the patient) within DECIPHER to facilitate the identification of rearrangement clusters and links between phenotype and genomic location.

References

  1. Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, Van Vooren S, Moreau Y, Pettett RM, Carter NP. Am J Hum Genet. 2009 Apr;84(4):524-33.

  2. Malan V, Raoul O, Firth HV, Royer G, Turleau C, Bernheim A, Willatt L, Munnich A, Vekemans M, Lyonnet S, Cormier-Daire V, Colleaux L, J Med Genet. 2009;. PMID: 19126570 DOI: 10.1136/jmg.2008.062034

  3. Zahir F, Firth HV, Baross A, Delaney AD, Eydoux P, Gibson WT, Langlois S, Martin H, Willatt L, Marra MA, Friedman JM. J Med Genet. 2007;44;556-61. PMID: 17545556 DOI: 10.1136/jmg.2007.050823

  4. Shaw-Smith C, Pittman AM, Willatt L, Martin H, Rickman L, Gribble S, Curley R, Cumming S, Dunn C, Kalaitzopoulos D, Porter K, Prigmore E, Krepischi-Santos AC, Varela MC, Koiffmann CP, Lees AJ, Rosenberg C, Firth HV, de Silva R, Carter NP. Nat Genet. 2006;38;1032-7. PMID: 16906163 DOI: 10.1038/ng1858

Filed under: Bioinformatics

DECIPHERing human disease

Taken from http://www.sanger.ac.uk/Info/Press/2009/090416.shtml

Database provides a key to unlock the causes of illnesses

Five years after the inception of the DECIPHER database – researchers have published a report that reveals the developing role of the database in revolutionising both clinical practice and genetic research.

The report explores the growing benefits of DECIPHER for researchers, clinicians and patients – highlighting how the data, provided by around 100 centres and shared openly worldwide, can benefit all three groups.

DECIPHER – the Database of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources – is hosted at the Wellcome Trust Sanger Institute. It was established in 2004 to catalogue submicroscopic structural duplications, deletions and rearrangements in the genome – called copy number variants (CNVs) – and to uncover their possible role in disease.

“The first comprehensive map of human copy number variation was produced just three years ago, changing our understanding of human genetics” explains Nigel Carter, a lead member of the DECIPHER team from the Wellcome Trust Sanger Institute. “Since then, over 10,000 CNVs have been found, covering about five per cent of the human genome. This rate of advance has been remarkable: using new technologies, we are able to uncover the smaller, elusive variants at a 50 fold-higher resolution. But the pivotal role that DECIPHER plays is in looking at how these variants affect human health.”

The problem researchers face is that while many CNVs initially appear to have no visible effect on individual health, others appear to have minor effects, and some are harmful. What DECIPHER helps clinicians to do is to evaluate CNVs and determine whether or not they are linked to the patient’s problems. In some cases, the findings are novel or have been observed only a handful of times before. With consent from the patient, data can be shared worldwide and clusters of people with overlapping genetic rearrangements can be identified.

By looking at genetic information first in an unbiased and less subjective manner, recurrent genetic changes can be found, researchers can then seek matching symptoms. This reverses the traditional practice of identification where researchers would move from individuals with shared symptoms back to a chromosomal cause and is particularly helpful for conditions such as learning disability and congenital disorders which have a large number of different genetic causes.

“We need new ways to uncover those rearrangements that cause human disease. But we must also be wary of dismissing CNVs if they appear to have no physical effect,” says Charles Lee, an Associate Professor at Harvard Medical School and a Clinical Cytogeneticist at Brigham and Women’s Hospital in Boston, USA. “For example, there may be variants that only affect people with a specific genetic makeup; or sometimes specific combinations of variants may result in pathology.”

The report provides case studies in which DECIPHER played a pivotal role. In one example a four-year-old girl with symptoms of developmental delay and poor eye contact had a novel genetic finding and remained without a clear diagnosis. However, two new cases with similar structural variants were submitted to the database one year later, to provide the elusive diagnosis. The case studies exemplify increasing value of the database as clinicians add case information.

“DECIPHER is particularly useful when we look at patients with developmental delay, learning disability, dysmorphic features or congenital abnormalities, where, using genomic array technology, we can assign a diagnosis in 15 per cent of previously undiagnosed cases,” explains Helen Firth, Consultant Clinical Geneticist at Addenbrookes Hospital and lead author on the study. “This improvement is dependent on a fantastic level of collaboration. More than 2000 patient cases have been contributed to the DECIPHER database since its inception: its diagnostic power strengthens as new cases are added.”

DECIPHER is built upon the Ensembl genome browser. It is the only open-access, web-based interactive database of its type, although data from other databases are available. The report’s authors suggest that while combination of all data in one resource would be ideal, providing access to the data in one genome browser is a realistic and practical method of harnessing the combined power of the datasets.

Sharing data between researchers is increasingly important. As the role of CNVs in human disease is better understood, so resources such as DECIPHER will gain momentum that will drive significant health benefits and improvements to genetic counselling.

Filed under: Bioinformatics

Cloud computing: a new standard platform?

Cloud computing is becoming a technology mature enough for its use in genome research experiments. The use of large datasets, its highly demanding algorithms and the need for sudden computational resources, make large-scale sequencing experiments an attractive test-case for cloud computing. So far I have seen cloud computing demonstrated using R (1). However, it remains to be seen a rigorous comparison of its performance using a BLAST (2) search and its ability to cope with ever-increasing databases and open source frameworks such as bioperl (3) or bioconductor (4).

Cloud computing claims to be a resource where IT power is delivered over the Internet as you need it, rather than drawn from a desktop computer (5), in a fashion seemingly similar to having your own virtual servers available over the Internet (6). Some of the most important aspects of cloud computing are:

* Software as a Service (SaaS): where you buy a software license for a determined period of time.
* Utility Computing: storage and virtual servers that IT can access on demand.
* Web Services.

My first exposure to cloud computing came of an email from Matt Wood (7), a newly established group leader at the Sanger Institute (8), announcing the Cloud Computing Group (9) in Cambridge, UK. At that point I had no idea of what it meant. When I attended the meeting at Cambridge University’s Centre for Mathematical Sciences (10), to my surprise I found there a very select audience, ranging from the director of IT at Sanger, Phil Butcher (11), one of the Ensembl (12) software coordinators, Glenn Proctor (13), and quite a few local start-up companies.

Among the presenters, we had Simone Brunozzi, from Amazon’s Cloud Computing (14). I think he had an interesting story to tell: how Amazon, a well known company, is now involved in the business of cloud computing and selling it. Apparently, this technology they sell was developed for Amazon’s own business. Among their main challenges was to be able to address the capricious shopping habits of customers, with orders peaking around Christmas and quite flat the rest of the year. These trends required rapid adaptability of computational resources. The idea of cloud computing fitted well with their business model of e-commerce: you don’t need to care about where your computation is done, the only thing you care about is that you have the needed resources and do not have to pay for them when you don’t need them. One of the things that stroke me about Amazon’s presentation was that they would not tell us the number of processors they had at their disposal.

When it comes to using cloud computing for genomics research, prices may be quite expensive when they add up. The bioinformatics field, greatly influenced by the open-source movement, is not likely to rush to join Amazon’s cloud. Private efforts trying to make money out of human genome technology have remained rather unsuccessful to date: think of Celera Genomics or Lion Bioscience. I am skeptical of the bioinformatics community adopting cloud computing unless open source ideals are embraced: i) allowing people to develop and contribute to the technology if and when they want to, ii) allowing total openness in terms of its achievements and pitfalls and iii) making it free to use for everyone. I do not think that making it free does not mean there is no margin for profit. Think of the profitability of free-to-use technologies such as java (15) or MySQL (16), both components of SUN Microsystems’ (17) business.

Despite the promise of potential benefits for the bioinformatics community, the way the cloud is being portrayed does not conform the ideals of free access and openness. Unless these ideals are implemented to some extent, I see it difficult for the cloud to take root in the bioinformatics field and become a new standard platform for genome research.

References

1. http://www.r-project.org/
2. http://blast.ncbi.nlm.nih.gov/Blast.cgi
3. http://www.bioperl.org/wiki/Main_Page
4. http://www.bioconductor.org/
5. http://www.guardian.co.uk/technology/2008/sep/29/cloud.computing.richard.stallman
6. http://www.infoworld.com/article/08/04/07/15FE-cloud-computing-reality_1.html
7. http://www.sanger.ac.uk/Users/mw4/
8. http://www.sanger.ac.uk/
9. http://cloudcamb.org/
10. http://www.cms.cam.ac.uk/site/
11. http://www.yourgenome.org/people/phil_butcher.shtml
12. http://www.ensembl.org/index.html
13. http://www.ebi.ac.uk/Information/Staff/person_maintx.php?s_person_id=299
14. http://aws.amazon.com/ec2/
15. http://www.java.com/en/
16 http://www.mysql.com/
17. http://www.sun.com/

Filed under: Bioinformatics , , , , , ,

About this Blog

Written and maintained by Manuel Corpas, a Computational Biologist at the Wellcome Trust Sanger Institute (Cambridge, UK), a leading international center for genome research. Any views expressed here are the author's alone and do not necessarily form part of the official positions of his employer.
Subscribe to me on FriendFeed