How (Not) To Be a Bioinformatician: 7,749 Accesses Two Weeks On
June 11, 2012 § Leave a Comment
Well, beyond my wildest dreams, it seems that this publication continues to be highly accessed. It has been only two weeks since the article was published and there have been 7,749 accesses. No formal marketing was done, only Tweeter was used by some colleagues I follow. Once these tweets went out, suddenly the response went viral.
Below I summarize some of the Twitter responses I came across, good and bad. Quite fun to read!
@pathogenomenick: heh, great article from @manuelcorpas warns of: “a new breed of a mutant, super-resistant bioinformatician species” http://t.co/rKkq0pBF (14 Retweets)
@assemblathon: Required reading: How not to be a bioinformatician http://www.scfbm.org/content/7/1/3/abstract (via @manuelcorpas) (11 Retweets)
@KamounLab: Liked: “How Not To Be A Bioinformatician” by @manuelcorpas scfbm.org/content/7/1/3/… (1 Retweet)
@KiranGarimella: @pathogenomenick @manuelcorpas Can I retweet this again – like everytime I run into nightmarishly awful bioinformatic software? #SpamRecipe
@heyaudy: @assemblathon @manuelcorpas never use a GUI? C’mon. I get a $23,000/yr stipend minus fees. Developing a GUI would x3 development time :)
@yokofakun: [delicious] “How Not To Be A Bioinformatician” #tweet Source Code for Biology and Medicine: Although published m… http://t.co/9uVXEN5a4:10pm, May 28 from twitterfeed
@rguha: @yokofakun Someone actually wrote this as a paper? And some jrnl actually published it? Wonder who is more desperate http://t.co/gCKxJ7qX
@manuelcorpas: @rguha I think I’m pretty desperate RT Someone actually wrote this as a paper? [..] Wonder who is more desperate http://t.co/U2et5dkC
Some 10 Current Interesting Challenges in (Computational) Biology
June 5, 2010 § 5 Comments
This is not an exhaustive list, but rather a compendium of current problems that I encounter on a regular basis. This post might be especially useful for students who want to find a challenging problem for their research or simply anyone interested to know some of the science that goes on at the Wellcome Trust Genome Campus and beyond.
- To understand genome variation. How to explain variation within and between species? What are the mechanisms that produced those changes? How can those changes explain different susceptibilities to diseases and traits?
- To predict a genotype given a phenotype. How to correlate phenotypic terms to specific mutations? How to encode phenotypes in a computationally friendly format?
- To understand genetic heritability of complex diseases like Alzheimer’s, Parkinson’s or Stroke. GWAS studies have shown that the contribution of any one gene to specific complex diseases is meager or marginal in most cases. What models are needed for modeling mutation leading to disease? What pieces of the puzzle are missing?
- To optimally manage the data resulting from large scale experiments. How to store this data and make it accessible? Where to store it? Locally? In the cloud? How to make sure that no important data is lost?
- To optimally integrate data from disparate sources for analysis. Should we use federated systems? How to combine the ever-growing number of formats? What software to use to make possible such analyses? How to visualize this data more intuitively?
- Data privacy and accessibility. As more and more sensitive data is produced for analysis of patients’ genomic disorders, how not to hamper reproducibility of experiments? At the same time, how can we protect the privacy of patients? How to secure systems where sensitive data is stored?
- Understanding the effects of epigenetics in molecular regulation and disease. What mechanisms are available for molecular regulation? How does it affect gene expression? What molecular agents are involved in epigenetics regulation?
- Understanding the role of RNAs as enzymes and regulatory entities. How many different kinds of RNA are there? What is their function? How did they evolve?
- How do transmembrane proteins fold? Given a protein sequence, can we predict their final 3D functional state? How does the celular membrane affect the folding process? What helper molecules are involved to make sure that the protein folds correctly?
- Automatic extraction and text mining. Given the current mass of scientific literature, how can we extract automatically this knowledge from text? How close can we get for computers to “understand” human language? How to structure scientific literature to make it more machine-readable?
Sure I am missing many other important topics. I do apologize for those that I missed. Feel free to add your own if you wish.
Cloud computing: a new standard platform?
February 8, 2009 § 1 Comment
Cloud computing is becoming a technology mature enough for its use in genome research experiments. The use of large datasets, its highly demanding algorithms and the need for sudden computational resources, make large-scale sequencing experiments an attractive test-case for cloud computing. So far I have seen cloud computing demonstrated using R (1). However, it remains to be seen a rigorous comparison of its performance using a BLAST (2) search and its ability to cope with ever-increasing databases and open source frameworks such as bioperl (3) or bioconductor (4).
Cloud computing claims to be a resource where IT power is delivered over the Internet as you need it, rather than drawn from a desktop computer (5), in a fashion seemingly similar to having your own virtual servers available over the Internet (6). Some of the most important aspects of cloud computing are:
* Software as a Service (SaaS): where you buy a software license for a determined period of time.
* Utility Computing: storage and virtual servers that IT can access on demand.
* Web Services.
My first exposure to cloud computing came of an email from Matt Wood (7), a newly established group leader at the Sanger Institute (8), announcing the Cloud Computing Group (9) in Cambridge, UK. At that point I had no idea of what it meant. When I attended the meeting at Cambridge University’s Centre for Mathematical Sciences (10), to my surprise I found there a very select audience, ranging from the director of IT at Sanger, Phil Butcher (11), one of the Ensembl (12) software coordinators, Glenn Proctor (13), and quite a few local start-up companies.
Among the presenters, we had Simone Brunozzi, from Amazon’s Cloud Computing (14). I think he had an interesting story to tell: how Amazon, a well known company, is now involved in the business of cloud computing and selling it. Apparently, this technology they sell was developed for Amazon’s own business. Among their main challenges was to be able to address the capricious shopping habits of customers, with orders peaking around Christmas and quite flat the rest of the year. These trends required rapid adaptability of computational resources. The idea of cloud computing fitted well with their business model of e-commerce: you don’t need to care about where your computation is done, the only thing you care about is that you have the needed resources and do not have to pay for them when you don’t need them. One of the things that stroke me about Amazon’s presentation was that they would not tell us the number of processors they had at their disposal.
When it comes to using cloud computing for genomics research, prices may be quite expensive when they add up. The bioinformatics field, greatly influenced by the open-source movement, is not likely to rush to join Amazon’s cloud. Private efforts trying to make money out of human genome technology have remained rather unsuccessful to date: think of Celera Genomics or Lion Bioscience. I am skeptical of the bioinformatics community adopting cloud computing unless open source ideals are embraced: i) allowing people to develop and contribute to the technology if and when they want to, ii) allowing total openness in terms of its achievements and pitfalls and iii) making it free to use for everyone. I do not think that making it free does not mean there is no margin for profit. Think of the profitability of free-to-use technologies such as java (15) or MySQL (16), both components of SUN Microsystems’ (17) business.
Despite the promise of potential benefits for the bioinformatics community, the way the cloud is being portrayed does not conform the ideals of free access and openness. Unless these ideals are implemented to some extent, I see it difficult for the cloud to take root in the bioinformatics field and become a new standard platform for genome research.
References
1. http://www.r-project.org/
2. http://blast.ncbi.nlm.nih.gov/Blast.cgi
3. http://www.bioperl.org/wiki/Main_Page
4. http://www.bioconductor.org/
5. http://www.guardian.co.uk/technology/2008/sep/29/cloud.computing.richard.stallman
6. http://www.infoworld.com/article/08/04/07/15FE-cloud-computing-reality_1.html
7. http://www.sanger.ac.uk/Users/mw4/
8. http://www.sanger.ac.uk/
9. http://cloudcamb.org/
10. http://www.cms.cam.ac.uk/site/
11. http://www.yourgenome.org/people/phil_butcher.shtml
12. http://www.ensembl.org/index.html
13. http://www.ebi.ac.uk/Information/Staff/person_maintx.php?s_person_id=299
14. http://aws.amazon.com/ec2/
15. http://www.java.com/en/
16 http://www.mysql.com/
17. http://www.sun.com/