After many months of having performed the sequencing of my personal exome, I now make it available to the community for public use. I release it under a public domain license (CC0 1.0 Universal), giving you permission to use this data in any way.
What is an exome?
An exome is the ~1% of my genome that encodes for proteins.
Why do I release my personal exome?
When my family and myself made our genotypes available through the Internet, we immediately received results from researchers around the world who took our data for analysis and came back with interesting results. As a result of this, we have been able to learn much about ourselves. I have reported this in a previous entry on this blog entitled “Benefits for Publishing Family Genomes on the Internet“. I now follow the same principle: if I make my exome available for people to analyse it, I can expect that some researchers may come back with interesting results.
What data do I actually release?
I release my 4 FastQ files that were given to me by my sequencing provider. This is the same kind of information that 23andMe gives in their current exome analysis offer. This information basically consists of raw reads that need to be aligned to a reference assembly. Once aligned, interesting variation data can be inferred.
What do I ask in return?
Nothing. I do appeal though to the good will of potential users to report back to me anything interesting they might find.
How big are the files?
They are huge. On average they are about 0.6 Gb per file and I have 4 of these. That means that it can take several hours for each file to be downloaded. Be patient!
Where can I get them?
How did I get my personal exome sequenced?
Completely independently. If you want to know the story on how I did it, please refer to my blog entries “Getting My Genome Sequencing Done” Part I and Part II. As it is implied there, I managed to get my personal genome sequenced by knocking on quite a few doors and then finding someone who would sponsor me to do so. In fact, part of this exercise’s aim was to prove that it is possible now a days for ordinary citizens to get their genomes sequenced if they so wish.
Reblogged this on Manuel Corpas' Blog.
For an initial report carried out by OGT on the analysis of this exome please refer to: http://manuelcorpas.com/2012/02/06/my-personal-exome-analysis-part-i-first-findings/
this is the most in teresting thing ive found on the internet in a long long time.. how i would love to participate in this type study. I have keep interest in it but cannot figure one thing that would make me or my family interesting. nothing that would stand out that would entice a company to spend that kind of money on me or them. Im grandma. my greatest accomplishment, our family has heart disease or cancer as predictable way of death and much mental issues. I dont know one of us who does not drown occasionally in some type mental pool of pain. BUT would that in itsself make us worthwhile. How I would love to know what contributed to this mess of carbon and so on to produce this array of humanity.
[…] [Edited by admin]
¿Compartirías tu genoma? « Blog de piratas de la ciencia
[…] esté dispuesta a ayudarnos. Esto mismo ha pensado Manuel Corpas, un investigador que entregó los datos de secuenciación de su exoma a disposición de la comunidad científica y que ya ha obtenido los primeros resultados. Es una […]
My Personal Exome Analysis (Part I): First Findings « Manuel Corpas' Blog
[…] Exome Analysis (Part I): First Findings You may have read in a previous entry about the release of my raw personal exome data. Although users were not required to report back any finding derived from this data, my hope was […]
We understand exltcay what you mean when you “kind of ” have an answer. Leaves you that feeling of “where do we go from here…” seeing as there is no prior information.Our daughter is 3.5yr old and also is non-verbal, has developmental delays, and a seizure disorder.After our (long) wait for our genetic testing to come back, we found she has very small part of her 21 chromosome missing. We also got the “We’re unsure of the outcome as we haven’t seen this before.” We have found a group based in the UK called Unique (www.rarechromo.org). They have not been able to give us many more ansers, but have connectd us with other “unique” families. All the best to you and the family!
Thanks Daniel for the info, I’ll definitely pass it along and keep that in mind as our project moves forward.
Why CC BY-SA 3.0 license?
Cc-BY-SA license mainly to be consistent with my blog’s license. Is this license a problem for you?
It is not a problem… I am asking this because there are a discussion about your blog post on google plus and a person commented that that by placing it in CC3.0 SA would make the data more difficult to integrate into already existing public databases because it requires that the data be provided according to the same license you selected so it would be better to have the least restrictive license. I do not know enough on the subject to have an opinion on that :)
Out of curiosity what was coverage like? I’m part of a disease genomics group and exome sequencing is part of our overall pipeline. One problem we sometimes run in to is that the exon capture kits can result in poor coverage of some exons, or no coverage at all.
thanks for your comment. I’ve been told by the provider that the coverage for this exome is at least 30x.
Dan, I can’t quote exact coverage figures because I don’t know what capture kit was used, but if we assume an Agilent SureSelect 50Mb kit, then mean target coverage is >37x for Manuel according to my analysis of the data. I think there are always going to be issues with biased capture. It’s easier to get this right for more focused targeted re-sequencing where the baits can be designed for more even coverage. We find most people ask for 50x mean target coverage for rare-disease studies.
Thanks for the info, good to know. It has turned out oddly with two or three of our projects that the most likely causal variant was found in some random exome somewhere that had just not been captured at all so it is something we now pay attention to when we get our data back.
Dan, this is one reason we don’t include depth filters in our pipeline. No coverage is one thing, but I’m quite wary of throwing away low coverage variants – especially if the read quality and mapping quality is good. For trio analysis where we need to be confident about genotype calls we do filter by depth by proxy. We recommend to people to start with variants of 20x depth or higher first, but if nothing is found then we don’t discourage exploration of the lower coverage data for potential causal variants. It just implies that a little more confirmation is required, but to follow up a tranche of low-coverage variants of interest isn’t a terrible burden by other genotyping methods.