A Glimpse Of What My Fecal DNA Contains

June 10, 2013 § Leave a Comment

A quick look in a random piece of Sample 1 of Corpas personal fecal sample

A quick look in a random piece of Sample 1 of Corpas personal fecal sample

After my initial call for crowdsourcing the analysis of my personal fecal DNA, I had some results sent back via Twitter from Willy Valdivia at Orion Biosciences. Thank you very much for that. Willy kindly sent me a couple of figures, the first one above shows a random piece of Sample 1 of my personal fecal sample and the second figure below a histogram with the percentage of DNA for top 25 organisms for my fecal sample.

Is my poo any different from anyone else’s? We’ll see.

Percentage of DNA for top 25 organisms for #Corpas fecal sample

Percentage of DNA for top 25 organisms for #Corpas fecal sample

Fecal DNA From Personal Sample Available For Download

May 17, 2013 § 2 Comments

The DNA from my personal fecal sample is finally available for public download. The data are released under a public domain license (CC0 1.0). This means that you can copy, modify, distribute and work on it, even for commercial purposes, without asking permission.

All DNA contained in my faeces can be used to create a metagenomics analysis. This includes the identification of every bacteria, virus and every other living organism present in my poo. At the end of this process I hope to be able to complement my metagenomics analysis results with those from my personal genome and the genomes of my family.

Escherichia coli, one of the many species of bacteria present in the human gut.

The sequencing was performed at the BGI, using high-throughput Illumina sequencing technology (HiSeq) with paired-end sequencing. The insert library size is 170bp and the output contains 1.2 G of clean data in two fastq files.

The files have been uploaded to figshare, from which they can be freely downloaded. I would appreciate if any uses and interesting findings using these data could be emailed to me. Credit will be duly shared in any posterior usage of these findings in publications, articles or blog references.

Figshare page for Metagenomics Raw Data from Manuel Corpas's fecal sample

Figshare page for Metagenomics Raw Data from Manuel Corpas’s fecal sample. Click on this image to access to the download page.

How Facebook Helped Me Discover I’m A Red Hair Gene Carrier

April 12, 2013 § 4 Comments

Is it weird for a Spaniard to have red hair? The typical stereotype for a Mediterranean person is brown-skinned, not too tall and with dark hair. I do not seem to fit all those stereotypes very well, except for the dark hair. At least so I thought until I posted this picture of my beautiful family on my Facebook profile:

How is it possible that my children have the red hair gene expressed so dominantly? I have dark hair and my parents do so too!

How is it possible that my children have the red hair gene expressed so dominantly? I have dark hair and my parents do so too!

Of the five of us I am the only one without red hair. Seeing this picture really brought it home to me, it was strange that everyone of my three children had inherited my wife’s ‘recessive’ red hair!

I did not give a lot of importance to this until my colleague and Facebook friend Dave Adams, who happens to lead a research group at the Wellcome Trust Sanger Institute, asked me whether I had checked the MC1R gene.

The protein encoded by the MC1R gene is found in melanocytes, the cells that give hair and skin their color. The variants associated with red hair alter the protein’s function, tipping the balance of pigment production in melanocytes from black-brown eumelanin to red-yellow pheomelanin [1].

Dave is well aware of my efforts to crowdsource my genome data analysis and those of my blood relatives (parents, siblings and aunts and uncles). Since I have had my exome done, and following Dave’s suggestion, I looked for the animo acid changes he suggested (r151c, r160w and d294h) in the MC1R gene. Below you can see some of the comments of our conversation on Facebook:

Facebook chat showing Dave Adam's conversation with me about finding the origin of the red hair in my offspring.

Facebook chat showing Dave Adam’s conversation with me about finding the origin of the red hair in my offspring.

I have a VCF file for all variations in my genome available in figshare for public download. I searched the file for the 89978527-89987385 interval in which the MC1R gene is located in chromosome 16 and found:

16 89986091 rs11547464 G A

This indicates that in position 89986091, there is a small change of one letter (SNP rs11547464) that makes my DNA in that position differ from the one of the human genome reference. The reference genome has a G whereas I have an A.

I also looked at my 23andMe genotype using myKaryoView, which also includes this rs11547464 SNP, and found that my genotype is ‘AG’. Doing some research with this I found that AG in the rs11547464 SNP encodes a missense change on the protein sequence (R142H), making me a ‘carrier’ state for ‘red hair’ [2].

More information about the relation of this SNP the phenotype showed that this mutation has been shown to be deleterious [3] and that this MC1R variant is “functional” [4].

According to Dave, I am a carrier for this red hair mutation and presumably my wife is homozygous for another variant with my kids being compound heterozygous. This means that perhaps my wife has another variant somewhere that also contributes to my children having red hair.

This explains, at least partly, how my offspring’s red hair is so strong, something that in principle should be self evident from the picture above. There is something satisfying though about being able to confirm the obvious with scientific evidence.

References

[1] 
http://blog.23andme.com/news/snpwatch-researchers-find-link-between-red-hair-and-avoiding-the-dentist/

[2] 
http://www.ianlogan.co.uk/23andme/open/nancy-grossman.htm

[3] 
http://www.medwelljournals.com/fulltext/?doi=javaa.2011.928.931

[4] 
http://blog.23andme.com/news/snpwatch-researchers-find-link-between-red-hair-and-avoiding-the-dentist/

iAnn: Scientific Events Should Be Curated Only Once!

April 8, 2013 § Leave a Comment

In this presentation I introduce iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. This presentation was given at the GMOD workshop in Cambridge on April 6th, 2013.

BioJS Presentation at GMOD Meeting April 2013 in Cambridge

April 5, 2013 § Leave a Comment

This talk was given on the morning of April 5th 2013 at the GMOD meeting, preceding the Biocuration 2013 conference. GMOD is the Generic Model Organism Database project, a collection of open source software tools for creating and managing genome-scale biological databases.

BioJS: Web 2.0 Reusable Components For Visualization of Biological Data

March 4, 2013 § 1 Comment

Despite may Bio* being developed (BioPerl, BioJava, etc.), to date no coordinated Bio* community effort has been established for JavaScript. JavaScript is the language of choice for implementation of dynamic and interactive web applications. BioJS provides a catalogue of open source modules in JavaScript for Life Sciences. These modules include many commonly used functionalities, available for developers or scientists to download. This consistency of development promotes reutilization of existing components and a genuine one-stop shop for development of bioinformatics web applications. Resource discovery is enabled by BioJS’s registry which includes all of BioJS source code libraries, documentation and guidelines. These are freely available for public use in what we believe it is to date the most extensive catalogue of open source JavaScript biological widgets. As more bioinformaticians continue to develop modern web applications, we expect the BioJS community to continue to grow.

The BioJS publication is now out in the Bioinformatics journal.

Screen Shot 2013-03-04 at 15.12.43

Corpas Family Exome Data Available For Public Download

January 21, 2013 § Leave a Comment

Readers may remember the Crowdfunding Campaign that we run to collect funds to sequence the genomes of the Corpas family. We are pleased to announce the immediate release of our personal exomes (the coding portions of our genes) currently under a CC-BY license, just for issues of compatibility of license. At this point you have permission to use these data in any way you wish as long as you attribute it to the Corpas family.

Where is it available?

We have decided to make the data available through figshare because it makes the data immediately citable, providing a doi identifier. So here is where the trio data can be downloaded:


http://dx.doi.org/10.6084/m9.figshare.106340

Please note that the above data only include the latest sequencing data from our family: exome data from mother, father and daughter. Previous released data from son’s exome are here:


http://dx.doi.org/10.6084/m9.figshare.92584

Why do we release our personal exomes?

When my family and myself made our genotypes available on the Internet, we immediately received results from researchers from around the world who took our data for analysis and came back with interesting insights. As a result of this, we have been able to learn much about ourselves. I have reported this in a previous entry on this blog entitled “Benefits for Publishing Family Genomes on the Internet“. We now follow the same principle: if we make our exomes available for people to analyse them, we can expect that some researchers may come back with interesting results.

What new data do we actually release?

Fastq files for whole exome sequencing from the Corpas family: mother, father, daughter. The data comes from 3 saliva samples. Exome capture was performed using Agilent SureSelect Human All Exon 44.

The captured material was sequenced using Illumina’s HiSeq technology.

The data is expected to have 30X effective mean depth per sample, having removed adaptor pollution and low quality sequence.

What do we ask in return?

We do appeal to the good will of potential users to report back to us anything interesting they might find.

How big are the files?

They are huge. On average they are about 1 Gb per file and we have 6 of these. That means that it can take several hours for each file to be downloaded. Please be patient!

Where can I get them?

Here:


http://dx.doi.org/10.6084/m9.figshare.106340


http://dx.doi.org/10.6084/m9.figshare.92584

The top link is for mother, father and daughter. The botton link is for son.

How did we get our personal exome sequenced?

Completely independently. If you want to know the story on how I did it myself, please refer to my blog entries “Getting My Genome Sequencing Done” Part I and Part II. As it is implied there, we managed to get my personal genome sequenced by knocking on quite a few doors and then finding someone who would sponsor us to do so. In fact, part of this exercise’s aim was to prove that it is possible now a days for ordinary citizens to get their genomes sequenced if they so wish. We now go step ahead by publishing our whole exomes on the Internet.

Follow

Get every new post delivered to your Inbox.

Join 29 other followers