My Personal Exome Now Publicly Released

After many months of having performed the sequencing of my personal exome, I now make it available to the community for public use. I release it under a public domain license (CC0 1.0 Universal), giving you permission to use this data in any way.

What is an exome?

An exome is the ~1% of my genome that encodes for proteins.

Why do I release my personal exome?

When my family and myself made our genotypes available through the Internet, we immediately received results from researchers around the world who took our data for analysis and came back with interesting results. As a result of this, we have been able to learn much about ourselves. I have reported this in a previous entry on this blog entitled “Benefits for Publishing Family Genomes on the Internet“. I now follow the same principle: if I make my exome available for people to analyse it, I can expect that some researchers may come back with interesting results.

What data do I actually release?

I release my 4 FastQ files that were given to me by my sequencing provider. This is the same kind of information that 23andMe gives in their current exome analysis offer. This information basically consists of raw reads that need to be aligned to a reference assembly. Once aligned, interesting variation data can be inferred.

What do I ask in return?

Nothing. I do appeal though to the good will of potential users to report back to me anything interesting they might find.

How big are the files?

They are huge. On average they are about 0.6 Gb per file and I have 4 of these. That means that it can take several hours for each file to be downloaded. Be patient!

Where can I get them?

Here:

  1. File 1
  2. File 2
  3. File 3
  4. File 4

How did I get my personal exome sequenced?

Completely independently. If you want to know the story on how I did it, please refer to my blog entries “Getting My Genome Sequencing Done” Part I and Part II. As it is implied there, I managed to get my personal genome sequenced by knocking on quite a few doors and then finding someone who would sponsor me to do so. In fact, part of this exercise’s aim was to prove that it is possible now a days for ordinary citizens to get their genomes sequenced if they so wish.

15 comments

  1. Anonymous

    this is the most in teresting thing ive found on the internet in a long long time.. how i would love to participate in this type study. I have keep interest in it but cannot figure one thing that would make me or my family interesting. nothing that would stand out that would entice a company to spend that kind of money on me or them. Im grandma. my greatest accomplishment, our family has heart disease or cancer as predictable way of death and much mental issues. I dont know one of us who does not drown occasionally in some type mental pool of pain. BUT would that in itsself make us worthwhile. How I would love to know what contributed to this mess of carbon and so on to produce this array of humanity.
    […] [Edited by admin]

    1. Esme

      We understand exltcay what you mean when you “kind of ” have an answer. Leaves you that feeling of “where do we go from here…” seeing as there is no prior information.Our daughter is 3.5yr old and also is non-verbal, has developmental delays, and a seizure disorder.After our (long) wait for our genetic testing to come back, we found she has very small part of her 21 chromosome missing. We also got the “We’re unsure of the outcome as we haven’t seen this before.” We have found a group based in the UK called Unique (www.rarechromo.org). They have not been able to give us many more ansers, but have connectd us with other “unique” families. All the best to you and the family!

    1. admin

      Cc-BY-SA license mainly to be consistent with my blog’s license. Is this license a problem for you?

    2. Pepetideo

      It is not a problem… I am asking this because there are a discussion about your blog post on google plus and a person commented that that by placing it in CC3.0 SA would make the data more difficult to integrate into already existing public databases because it requires that the data be provided according to the same license you selected so it would be better to have the least restrictive license. I do not know enough on the subject to have an opinion on that :)

  2. Dan Gaston

    Out of curiosity what was coverage like? I’m part of a disease genomics group and exome sequencing is part of our overall pipeline. One problem we sometimes run in to is that the exon capture kits can result in poor coverage of some exons, or no coverage at all.

    1. admin

      Hi Dan,
      thanks for your comment. I’ve been told by the provider that the coverage for this exome is at least 30x.

    2. Daniel Swan

      Dan, I can’t quote exact coverage figures because I don’t know what capture kit was used, but if we assume an Agilent SureSelect 50Mb kit, then mean target coverage is >37x for Manuel according to my analysis of the data. I think there are always going to be issues with biased capture. It’s easier to get this right for more focused targeted re-sequencing where the baits can be designed for more even coverage. We find most people ask for 50x mean target coverage for rare-disease studies.

    3. Dan Gaston

      Thanks for the info, good to know. It has turned out oddly with two or three of our projects that the most likely causal variant was found in some random exome somewhere that had just not been captured at all so it is something we now pay attention to when we get our data back.

    4. Daniel Swan

      Dan, this is one reason we don’t include depth filters in our pipeline. No coverage is one thing, but I’m quite wary of throwing away low coverage variants – especially if the read quality and mapping quality is good. For trio analysis where we need to be confident about genotype calls we do filter by depth by proxy. We recommend to people to start with variants of 20x depth or higher first, but if nothing is found then we don’t discourage exploration of the lower coverage data for potential causal variants. It just implies that a little more confirmation is required, but to follow up a tranche of low-coverage variants of interest isn’t a terrible burden by other genotyping methods.

Leave a Reply

%d bloggers like this: