It has been for a long while since I’ve been wanting to write about the stuff that Mike Cariaso, founder of SNPedia, has been doing with my family genotypes. Initially, he performed their data analysis with Promethease for assignment of traits and annotation to observed SNPs. More recently, he has also developed a tool for visualization and comparison of genotypes between different people. He has used my family’s and Manu Sporny’s genotypes as test cases.
This is an unanticipated benefit we have experienced as a family for publishing our genomes on the Internet. Using Promethease’s report we were able to learn that dad is lactose intolerant. The fact that he did not like milk and had not taken milk in years kind of made sense when we discovered that his two SNPs rs4988235(C;C) and rs182549(C;C) make him unlikely to digest lactose with 70% probability. This result regarding lactose intolerance was in fact in the 23andMe report but we missed it.
It is clear that Direct-to-consumer genetic companies do try to cater to the non-expert, i.e. the majority of its customer base. The novel SNPedia visualization tool will be an useful addition to those of us who strive to DIY our own discoveries about our personal genomes data.
Using his visualization tool, when I compare all my SNPs with those of my sister’s, I find that 68% of mine are identical to hers, a total of 389,250 (see below).

Note that the graph is using a logarithmic scale. Of all our analyzed SNPs, 25% are halfmatch (i.e. one of the alleles is common to both of us) and about 2% are conflicts. Example of conflicts may include different SNPs with the same position. This, according to Mike, may not be an accident. Because I know that we were analyzed in two different array platforms, version 2 and version 3 respectively, I can now tell the number of SNPs that are different between both of us, i.e. not present in either genotype. Of the total 0.5 Million plus SNPs in my genome about 29,082 do not match hers.
The other nice feature this tool provides is an actual graphical representation of chromosomal SNPs in a map of pixels, colored consistently with the above notations: light blue means match, dark blue halfmatch, red conflict and grey different SNPs:

The above figure shows two representations for chromsome/chromosome comparison between my chromosome 1 and my sister’s. Clearly most of the area is light blue, indicating complete match. Also the number of differences, halfmatches and conflicts are reported. Clicking on any of these links, one can find the actual SNPs in conflict, getting an output that looks like this:
1 rs9729550 1 1125105 CC AA 2 rs12142199 1 1239050 GG AA 3 rs7531583 1 1696020 GG AA 4 rs6681938 1 1771080 CC TT 5 rs41307846 1 1949559 GG -- 6 rs3128296 1 2058766 TT GG 7 rs262654 1 2079386 AA GG 8 rs262688 1 2103425 GG TT 9 rs6659405 1 2362949 TT GG 10 rs4648482 1 2739781 CC TT 11 rs2483266 1 3225901 CC TT 12 rs868688 1 3290667 TT CC 13 rs10492939 1 3292731 AA GG 14 rs2493268 1 3298358 TT CC 15 rs871822 1 3302774 GG TT 16 rs12024847 1 3310659 TT CC 17 rs2821017 1 3510731 GG AA 18 rs3765761 1 3620336 CC TT 19 rs3765766 1 3624520 TT CC 20 rs4233262 1 4136842 CC TT 21 rs966321 1 4215064 GG TT 22 rs964715 1 4216644 TT CC 23 rs1390136 1 4241703 CC TT 24 rs4654545 1 4425464 TT CC 25 rs446529 1 4695274 CC TT
This table shows that for the first SNP, rs9729550, I have CC while my sister has AA.
In conclusion, Promethease and the SNPedia visualization tool is helping me learn more about my SNP genotype results, complementing the information that I initially got from my Direct-to-consumer provider. Hopefully I will be able to do some additional research based on the results hereby obtained.
If you want to see my family’s genomes with Mike Cariaso’s tool you can find it here:
Don’t forget to send me any exciting findings that you might encounter!
Corpas Family Exome Data Available For Public download « Manuel Corpas' Blog
[…] to learn much about ourselves. I have reported this in a previous entry on this blog entitled “Benefits for Publishing Family Genomes on the Internet“. We now follow the same principle: if we make our exomes available for people to analyse them, […]
Visualizing Your DNA Genotype Profile in a Blanket « Manuel Corpas' Blog
[…] data from the Corpas family dataset I hereby report about it in Manuel Corpas’ Blog. Here is another unpredicted and surprising effect from publishing our family genomes on the […]
Visualizing Your DNA Genotype Profile in a Blanket « Manuel Corpas' Blog
[…] data from the Corpas family dataset I hereby report about it in Manuel Corpas’ Blog. Here is another unpredicted and surprising effect from publishing our family genomes on the […]
My Personal Exome Now Publicly Released « Manuel Corpas' Blog
[…] learn much about ourselves. I have reported this in a previous entry on this blog entitled “Benefits for Publishing Family Genomes on the Internet“. I now follow the same principle: if I make my exome available for people to analyse it, I […]
cariaso
conflicts are possible when someone was run on multiple platforms. An example is David Ewing Duncan
http://snpedia.com/index.php/User:David_Ewing_Duncan
http://s3.amazonaws.com/promethease/reports/genome_David_Duncan_pooled.html
and
http://s3.amazonaws.com/promethease/reports/promethease_data/genome_David_Duncan_pooled_conflicts.html
different platforms are reporting different things for his rs2660917. That is a real conflict.
In the case of the Corpas family, in order to get the graphs the way I wanted (an M x N grid), I had to pool all of the family’s files together. This makes a correct Family comparison
http://files.snpedia.org/reports/promethease_data/promethease_corpas_family_comparison_newfamily.html
but a meaningless top level report
http://files.snpedia.org/reports/promethease_corpas_family_comparison.html
In particular note the 22k genotypes, while any of the individual family members reports have only 13k genotypes
http://files.snpedia.com/reports/genome_corpas_mom.html
or 9k in the case of Manuel on the older v2.
http://files.snpedia.com/reports/genome_corpas_son.html
Future releases should either resolve this or make it clearer what’s going on. But I practice ‘release early, release often’ and with this release my goal was just to get the family comparison graphs working well.
Some of this was explained to Manuel in emails between us as I was showing him my work in progress. The fact he announced it so widely was a bit of a (pleasant) surprise, but does lead to this sort of confusion. For this reason I’ve not yet announced the new family features anywhere on the Promethease page. When it’s working to my satisfaction, I’ll hype this a bit more.
As for the images, The clearest example of crossover is in the comparison of the mother vs the aunt. However in the Manuel vs sister image above, there does appear to be 5 bands of
(light blue, dark blue)
(light blue, dark blue, red)
(light blue, dark blue)
(light blue, dark blue, red)
(light blue, dark blue)
It’s subtle, and perhaps still not easy enough to see. Also its subject to the choice of probes on the microarray. The fact Manuel is on a v2 while his his sister is a v3 makes this quite difficult. However chromosomes 2 & 3 show the effect much more clearly.
http://files.snpedia.org/reports/promethease_data/file-434009335-231818500-layout-1-m1-chr-2.png
http://files.snpedia.org/reports/promethease_data/file-434009335-231818500-layout-0-m0-chr-2.png
http://files.snpedia.org/reports/promethease_data/file-434009335-231818500-layout-1-m1-chr-3.png
http://files.snpedia.org/reports/promethease_data/file-434009335-231818500-layout-0-m0-chr-3.png
gasstationwithoutpumps
How many of the “conflict” SNPs are real differences and how many are SNP-call errors from the platform? What would the result be if you had your data run twice (with different platforms)? How many differences would you see from yourself?
The random pattern in the chr1 comparison does not look like crossover between parental genomes, but just noise.
cariaso
Those features are unique to the paid reports from 0.1.114 and later, which isn’t yet online. But what the heck, ok now it is.
http://www.snpedia.com/index.php/Promethease
To make a report similar to the ones above put all of your family members data in during step 1. Then move through the wizard, and pay the $2. When it’s ready you’ll be able to step through a few more wizard screens. During the F1 report question put all of your family members data in again. Hit Next a few more times, and then let it run.
Most of the report won’t make any sense, since it is pooling all of your family members into one virtual person. But the very bottom of the page will have a link to an ‘experimental family report’. At that link you will find the ALL vs ALL comparison of the family members.
PS. Version 0.1.115 and later may change this behavior, caveat emptor.