My Personal Exome Analysis (Part I): First Findings

You may have read about the release of my raw personal exome data in a previous entry. Although users were not required to report back any finding derived from this data, my hope was that some of them would return with interesting results. The response to this call has been overwhelmingly positive and in less than a week Oxford Gene Technology (OGT) has kindly provided me with a report to facilitate the analysis of my personal exome. OGT’s donation has allowed the start of the “My Personal Exome Analysis” series in this blog. In Part I, I will be sharing some data and preliminary metrics gathered from OGT’s exome analysis services. I will continue to report further findings and insights as I keep exploring my personal exome at the deepest level that technology (and budget) currently allows.

In addition, I release under a CC0 license the following sequence-derived data from OGT’s services: a) the aligned and processed BAM file, b) the BAM file index and c) the compressed VCF file. The BAM file (.bam) is the binary version of a tab-delimited text file that contains sequence alignment data. The BAM file index (.bai) provides fast random access to the BAM file. The compressed VCF file (.vcf.gz) describes variant calls in text format. These format types are industry standard and can be used in a variety of research contexts involving genome visualization and analysis.

Looking at the summary metrics in OGT’s report, my personal exome produces:

  • 30,702 variations to the reference genome (GRCh37)
  • 5,565 non synonymous coding variations with consequences
  • A minimum of 61.42% of the on-target regions, covered with a depth of at least 20x (remember that this data was sequenced by the BGI).
  • A total of 2.54 Gigabases of sequence data read and aligned at high quality.

Figure 1 is a screenshot of the OGT report showing the summary of all variants identified, including those in dbSNP release 132.

Figure 2 summarizes all novel variants identified by OGT, filtering those in dbSNP release 132.

Download of BAM and VCF Files

You are allowed to use my personal exome’s BAM and VCF files under a completely free license CC0. You can add this data to any database or resource with no need for attribution. Any usage or finding derived from this data communicated back to me will be shared (if considered noteworthy) through this blog or publication, with due attribution or request for coathorship in papers.


  1. admin

    Reblogged this on Genomes, Web 2.0 and Bioethics and commented:

    After 4 years of work, I am pleased to say that two publications have emanated from this work:

    Low budget analysis of Direct-To-Consumer genomic testing familial data:


    Crowdsourced direct-to-consumer genomic analysis of a family quartet

    I am pleased to say that the latter article has been featured by BMC Genomes as one of the most influential articles published in the journal in 2015.

    1. admin

      Hi Francisco,

      This is really useful. thanks a lot for passing this information.


  2. cariaso

    processing just the exome yields 881 snpedia snps

    pooling it with your 23andMe v2
    pushes you up to 9591 snpedia snps

    There are 2 which conflict between the two platforms
    rs12344615 reported as (A;G) and (G;G)
    rs2290272 reported as (C;T) and (T;T)

    your v2 has 9241
    so there are 350 genos with annotations which are new

    These in particular occur in less than 10% of the HapMap and are likely to be the most interesting
    rs11465702(A;G) 0.0%
    rs12265684(C;G) 0.0%
    rs12934561(C;T) 0.0%
    rs2233682(A;G) 0.0%
    rs2852464(C;G) 0.0%
    rs3752472(C;T) 1.8%
    rs12344615(G;G) 2.7%
    rs7951(C;T) 4.7%
    rs1052773(A;A) 5.3%
    rs10409962(A;G) 6.2%
    rs2326369(C;T) 6.2%
    rs2522943(C;G) 7.7%
    rs2229944(C;T) 8.0%
    rs11852361(C;T) 8.8%
    rs25489(A;G) 8.8%

Leave a Reply