Visualizing Your DNA Genotype Profile in a Blanket

March 6th, 2012 § 1 Comment

I have encountered a rather ingenious idea in Ben Landau’s blog. You may have read that Mike Cariaso, founder of SNPedia, created a visualization graph with the genotype of my family where my family member genotypes were compared against each other. Visualization patterns were created that compared each chromosome against every other. To show how this actually looks, I have taken from Mike’s tool an image that shows the comparison of 23andMe genotypes from my mom and dad (x and y axis respectively), each pixel being a SNP and different colors representing match (light blue), half match (dark blue) and conflict (red).

Chromosome 1. Comparison of Corpas mum and dad 23andMe genotypes using SNPedia's visualization tool.

It seems that Ben has taken this idea further and designed a blanket that incorporates chromosomal patterns for a complete 23andMe genotype. I quote here the description for this blanket from the Ben’s blog:

First Gift is a precious blanket which compares the digital DNA data of a child with their parents. If the child’s genes are edited, these changes will mask the parent’s DNA with synthesized DNA. The blanket itself represents a sacred and fragile heirloom, where tampering with it could potentially lead to frayed edges and uncertain outcomes. This first genetic gift will be with the child for life, and will also be inherited by future generations.

Although technically speaking this visualization shows comparisons between any two individuals, and not between the two parents and child as it is mentioned in the blog, I am still amazed at the craftiness and ingenuity of this idea. And since it uses data from the Corpas family dataset, I hereby report it in Manuel Corpas’ Blog. Here is another unpredicted and surprising effect from publishing our family genomes on the Internet.

To finish this blog entry, I borrow from Ben a complete profile view of my sister’s genotype patterned in his ‘First Gift’ blanket. According to him, the weaving of the blanket was done at the Tilburg Textile Museum in Amsterdam.

Blanket showing genotype pattern comparisons in my sister's 23andMe genotype. Each shape in theory corresponds to a chromosome comparison, although I still need to understand which chromosome represents each of the 25 shapes above (Humans have 22 autosomes + 1 sexual pair + mitochondrial).

My Personal Exome Analysis (Part I): First Findings

February 6th, 2012 § 4 Comments

You may have read about the release of my raw personal exome data in a previous entry. Although users were not required to report back any finding derived from this data, my hope was that some of them would return with interesting results. The response to this call has been overwhelmingly positive and in less than a week Oxford Gene Technology (OGT) has kindly provided me with a report to facilitate the analysis of my personal exome. OGT’s donation has allowed the start of the “My Personal Exome Analysis” series in this blog. In Part I, I will be sharing some data and preliminary metrics gathered from OGT’s exome analysis services. I will continue to report further findings and insights as I keep exploring my personal exome at the deepest level that technology (and budget) currently allows.

In addition, I release under a CC0 license the following sequence-derived data from OGT’s services: a) the aligned and processed BAM file, b) the BAM file index and c) the compressed VCF file. The BAM file (.bam) is the binary version of a tab-delimited text file that contains sequence alignment data. The BAM file index (.bai) provides fast random access to the BAM file. The compressed VCF file (.vcf.gz) describes variant calls in text format. These format types are industry standard and can be used in a variety of research contexts involving genome visualization and analysis.

Looking at the summary metrics in OGT’s report, my personal exome produces:

  • 30,702 variations to the reference genome (GRCh37)
  • 5,565 non synonymous coding variations with consequences
  • A minimum of 61.42% of the on-target regions, covered with a depth of at least 20x (remember that this data was sequenced by the BGI).
  • A total of 2.54 Gigabases of sequence data read and aligned at high quality.

Figure 1 is a screenshot of the OGT report showing the summary of all variants identified, including those in dbSNP release 132.

Figure 2 summarizes all novel variants identified by OGT, filtering those in dbSNP release 132.

Download of BAM and VCF Files

You are allowed to use my personal exome’s BAM and VCF files under a completely free license CC0. You can add this data to any database or resource with no need for attribution. Any usage or finding derived from this data communicated back to me will be shared (if considered noteworthy) through this blog or publication, with due attribution or request for coathorship in papers.

My Personal Exome Now Publicly Released

January 23rd, 2012 § 14 Comments

After many months of having performed the sequencing of my personal exome, I now make it available to the community for public use. I release it under a public domain license (CC0 1.0 Universal), giving you permission to use this data in any way.

What is an exome?

An exome is the ~1% of my genome that encodes for proteins.

Why do I release my personal exome?

When my family and myself made our genotypes available through the Internet, we immediately received results from researchers around the world who took our data for analysis and came back with interesting results. As a result of this, we have been able to learn much about ourselves. I have reported this in a previous entry on this blog entitled “Benefits for Publishing Family Genomes on the Internet“. I now follow the same principle: if I make my exome available for people to analyse it, I can expect that some researchers may come back with interesting results.

What data do I actually release?

I release my 4 FastQ files that were given to me by my sequencing provider. This is the same kind of information that 23andMe gives in their current exome analysis offer. This information basically consists of raw reads that need to be aligned to a reference assembly. Once aligned, interesting variation data can be inferred.

What do I ask in return?

Nothing. I do appeal though to the good will of potential users to report back to me anything interesting they might find.

How big are the files?

They are huge. On average they are about 0.6 Gb per file and I have 4 of these. That means that it can take several hours for each file to be downloaded. Be patient!

Where can I get them?

Here:

  1. File 1
  2. File 2
  3. File 3
  4. File 4

« Read the rest of this entry »

What’s the Distribution of Oxytocin Alleles in the General Population?

November 18th, 2011 § Leave a Comment

In my previous post I commented on my family’s alleles for the rs53576 SNP of the oxytocin receptor (OXTR) gene. A GG allele seems to be associated to a more pro-social character. A follow up question would be “what would its distribution be for the general population”. Luckily, a few colleagues of mine and myself have compiled a database of n=52 23andMe genotypes from the public domain. While the number of individuals contained in the database is small and of predominantly European ethnic background, I still can get an approximate view of what the frequency would be for these alleles. I found the following distribution:

  • AA: 6 (11.5%)
  • AG or GA: 18 (34.6%)
  • GG: 28 (53.8%)

If I was to speculate from this finding and assuming that the GG allele association is true, it would seem that being pro-social is quite dominant among Europeans. In fact when I look at the distribution of rs53576 alleles per populations in the 1000 genomes project, the above distribution looks quite similar to the proportions shown for the European (CEU) pie chart:

Frequencies of rs53576 in the 1000 genomes project. CEU: European; CHB+JPT: Han Chinese and Japanese; YRI: Yoruban (from Nigeria). More yellow apparently means more social.

Obviously the samples quite are small to make a final conclusion but I let readers judge by themselves what these results might mean.

I would like to thank Karyn Megy as she gave me the idea of querying the public domain 23andMe database.

My rs53576 SNP Genotype Indicates I Am Pro-social — Phew.

November 18th, 2011 § 1 Comment

I just came across the paper from Kogan, A., et al. (2011) in PNAS that states that “individuals who are homozygous for the G allele of the rs53576 SNP of the oxytocin receptor (OXTR) gene tend to be more prosocial than carriers of the A allele.” Wanting to determine my genomic horoscope prediction of the month, I decided to check my allele status as well as the ones for other members of my family. Luckily, this SNP is among the ones that 23andMe analyzes in versions 2 and 3 (my chip version was 2 and rest of my family was 3).

To my pleasant surprise, all of my family are GG, except my aunt who is AG.

Family genotype for rs53576 SNP using myKaryoView. Clicking on graph features popups are released with type_id (genotype) information. Grouped popups include mine, sister, dad, mum. Independent, aunt.

In a post about the Kogan et al paper, Suzanne Elvidge writes that oxytocin, also known as the cuddle hormone, makes us feel good. It’s released during sexual intercourse, pair bonding and breastfeeding, and our levels (and the dog’s levels) rise when we stroke our pets. The oxytocin gene may also make us more optimistic. Differences in our responses to oxytocin seem to affect how empathic we are – so if you are a nice person, it might be (at least a little bit) down to your oxytocin gene.

Maybe my aunt’s temperament may not be just a consequence of her having red hair.

Tita, te quiero!

myKaryoView Paper Out

October 27th, 2011 § Leave a Comment

As of October 26th 2011, a paper about the myKaryoView tool has been published in PLoS One. myKaryoView is a genome browser specifically designed for visualization of Direct-to-Consumer (DTC) personal genetic data. We look forward to receiving feedback from users visualizing their own personal genomes and developers willing to extend further the code or simply make use of myKaryoView in a different context.

The paper is freely available and open access.

Citation: Jimenez RC, Salazar GA, Gel B, Dopazo J, Mulder N, et al. (2011) myKaryoView: A Light-Weight Client for Visualization of Genomic Data. PLoS ONE 6(10): e26345. doi:10.1371/journal.pone.0026345

A Genome Blogger Manifesto

September 28th, 2011 § 9 Comments

Have you ever wondered why some people have no reparation in sharing their genetic profiles? Why do they openly talk about something supposedly so private? I believe that no contradiction exists between wanting to protect one’s privacy yet sharing one’s genomic data with the world. I am more concerned about the information that Facebook collects about my profile than my genome data (provided that I live in a country where there I public health).

Sharing and comparing one’s genome with other personal genomes is a matter of necessity if one is to shed light on the meaning of one’s personal DNA.

This is why I became a genome blogger myself. Why one should be constrained by the information that genomic test reports provide? No personal genome analysis report can ever be complete, they will always be influenced by the biases of whomever is providing such a report.

*   *   *

Although no formal document seems to have been produced on what the core values for genome blogging should be yet, core beliefs driving personal genome-sharing should be made explicit. Here I present an initial and inherently imperfect first attempt to put in writing of what I believe genome blogger values could be. I do not expect every fellow blogger to agree with them, but I hope that at least they inspire some debate. These are not a fixed set of rules; on the contrary, I expect this thinking to evolve with the genomics technology itself. I base some of the ideas below on Marcus Wohlsen’s ‘Biopunk’ book, Meredith Patterson’s ‘biopunk manifesto’, Misha Angrist’s ‘Here is a human being’ book and Pekka Himanen’s ‘Hacker’s ethics’ book.

Core Values for Genome Blogging

  1. Intelligent exploration, experimentation and trial to push the boundaries of knowledge are a right for ordinary people. The days in which genetic science was only done by university professors or people working in corporate labs are now over. Now everyone should have the power and legitimacy to be able to discover, develop and find new things about their own genome data. « Read the rest of this entry »

Getting My Genome Sequencing Done (Part I)

July 12th, 2011 § 4 Comments

Readers of this blog may have come across the experiment my family did with Direct-to-consumer (DTC) genetic testing. We analyzed all our samples using 23andMe kits and started sharing and writing about our personal genome data. This experience has changed me dramatically as a person and researcher.  I started off as a bioinformatician with an interest in risks of genetic variants but now these experiences have helped me develop a real insight into the psychology of how these variants may impact on people’s reactions. As a family, we are truly experiencing a really positive and unexpected response from people contacting us via the Internet who are willing to tell us their findings about our family data.

After doing our whole genome genotypes, the next obvious step is to have our whole genomes sequenced. There is quite a lot of debate at the moment as to whether genome sequencing should be accessible to the general public and if so, to what extent. But I figured out that if “the rich and famous” can have their genome sequenced, perhaps with a bit of luck, the “ordinary and poor” (among which I include myself), could have a chance, even with zero budget. Zero budget for this exercise was an essential point of principle, given that we really would not be able to afford even a 10th of the price a genome currently costs (around $9,500; probably cheaper than this price by the time you are reading this).

I wasn’t sure how to do this, but I know that this might be possible and that we would get it done if we could. So I went onto the Internet and searched for whole genome sequencing. I found three potential good candidates that could do it on demand: Complete Genomics, the Illumina personal sequencing services and the Beijin Genomics Institute (BGI).  So the first thing I did, I sent them an email. Given that we had no money to spend and that there is no such a thing as free lunch, we thought that we needed to offer something substantial in return since we were asking them to waive us the fee of ~$50,000. The only substantial thing we could really offer was publicity, so the following proposal was sent to those three companies via their websites:

Dear Sir/Madam: I would like to offer you a deal/proposal. My family would like to have their whole genome sequenced with your company. In exchange for releasing to the public openly and freely on the Internet our genomes we thought you could sponsor us. This action could attract *a lot of attention* to [company name], as this is a pioneering move. Currently a very limited set of people are actually interested in sequencing their genomes. The only way you can reach the ordinary citizen (sooner rather than later) is if ordinary people, like my family publish their experiences and pave the way. My family, an ordinary family, constitutes an example of what this technology could do for any ordinary person, not just a scientist, etc. In addition to this, I want to fully research all of the social/ethical implications that publishing this information can bring. We also hope to share this information with the world. Currently all my family has genotyped their genomes with 23andMe and put all this data in the Internet for free download: http://manuelcorpas.com/five-family-relatives-genome-download/ To our knowledge, this is the first time that anything like that has been done. In barely a month since this information has been published, four different analyses from specialists/hobbyists have reached us, making us learn that dad, for example, is lactose intolerant [1]. Our point is that now, with DNA sequence providers, the door opens for DIY genome mining. The power of the Internet and computers may bring this technology to computer savvy people. For example, our 23andMe genomes are now been taken by SNPedia and several other ancestry projects such as Eurogenes and Artemis: http://www.snpedia.com/index.php/User:Manuelcorpas http://bga101.blogspot.com/2011/03/mds-analysis-of-southern-europe.html http://dioegenesartemis.blogspot.com/2011/04/first-results.html Although up to date the information provided by 23andMe has not revealed any nasty surprises about our genomes, we are aware that now anyone can report new findings that were not initially discovered in our genomes. We believe, however, that as a family we can gain a lot more than lose by sharing our genome data with the world. I believe my proposal could bring a lot of exposure to [company name] and therefore would request whether you could consider this offer. Best wishes, Manuel

Illumina never got back to us. Looking around we learned that their policy is that sequencing should be done with medical prescription. Fair enough.

I couldn’t wait long so I continued researching the matter and found Complete’s contact phone number on their website, so I rang them. To my surprise I was put through and the person was very polite with me and keen to listen to what I had to say. Since I have learned that Complete had already sequenced and published 69 genomes, available via this website:

http://www.completegenomics.com/sequence-data/download-data/

Among these genomes there is a multigenerational family with a bigger pedigree than the one I was proposing. This obviously meant that our offer wasn’t as innovative as we initially thought of. It seems that Complete Genomics will not do (at least for the time being) “Direct-to-consumer” business, but that still, their goal is to become the “Intel Inside” for human genome sequencing efforts, the technology underlying most human genome analyses. I thought that that was a cool objective if attainable.

I still didn’t give up. I tried to see whether there was a chance that Complete might change their mind, so I wrote to them about our incredibly interesting experience of family dynamics and family communication issues while discussing our personal genomes. So far we have not been lucky enough to get our genomes sequenced for free. Despite not achieving our outcome, there is a lot we have learned on the way though. What an interesting experience.

This is the end of part one on Getting My Genome Sequencing Done.

[1] This information was actually available in our 23andMe reports, but we missed it initially. We learned about this condition with the SNPedia tool Promethease

Personal Genetics: A Family Journey (Interview)

June 9th, 2011 § Leave a Comment

 

Benefits for Publishing Family Genomes on the Internet

June 6th, 2011 § 6 Comments

It has been for a long while since I’ve been wanting to write about the stuff that Mike Cariaso, founder of SNPedia, has been doing with my family genotypes. Initially, he performed their data analysis with Promethease for assignment of traits and annotation to observed SNPs. More recently, he has also developed a tool for visualization and comparison of genotypes between different people. He has used my family’s and Manu Sporny’s genotypes as test cases.

This is an unanticipated benefit we have experienced as a family for publishing our genomes on the Internet. Using Promethease’s report we were able to learn that dad is lactose intolerant. The fact that he did not like milk and had not taken milk in years kind of made sense when we discovered that his two SNPs rs4988235(C;C) and rs182549(C;C) make him unlikely to digest lactose with 70% probability. This result regarding lactose intolerance was in fact in the 23andMe report but we missed it.

It is clear that Direct-to-consumer genetic companies do try to cater to the non-expert, i.e. the majority of its customer base. The novel SNPedia visualization tool will be an useful addition to those of us who strive to DIY our own discoveries about our personal genomes data.

Using his visualization tool, when I compare all my SNPs with those of my sister’s, I find that 68% of mine are identical to hers, a total of 389,250 (see below).

SNP comparison between my sister and myself

Note that the graph is using a logarithmic scale. Of all our analyzed SNPs, 25% are halfmatch (i.e. one of the alleles is common to both of us) and about 2% are conflicts. Example of conflicts may include different SNPs with the same position. This, according to Mike, may not be an accident. Because I know that we were analyzed in two different array platforms, version 2 and version 3 respectively, I can now tell the number of SNPs that are different between both of us, i.e. not present in either genotype.  Of the total 0.5 Million plus SNPs in my genome about 29,082 do not match hers.

The other nice feature this tool provides is an actual graphical representation of chromosomal SNPs in a map of pixels, colored consistently with the above notations: light blue means match, dark blue halfmatch, red conflict and grey different SNPs:

Pixelated map of chromosome 1/chromosome 1 comparison between me and sister

The above figure shows two representations for chromsome/chromosome comparison between my chromosome 1 and my sister’s. Clearly most of the area is light blue, indicating complete match. Also the number of differences, halfmatches and conflicts are reported. Clicking on any of these links, one can find the actual SNPs in conflict, getting an output that looks like this:

1	rs9729550	1	1125105	CC	AA
2	rs12142199	1	1239050	GG	AA
3	rs7531583	1	1696020	GG	AA
4	rs6681938	1	1771080	CC	TT
5	rs41307846	1	1949559	GG	--
6	rs3128296	1	2058766	TT	GG
7	rs262654	1	2079386	AA	GG
8	rs262688	1	2103425	GG	TT
9	rs6659405	1	2362949	TT	GG
10	rs4648482	1	2739781	CC	TT
11	rs2483266	1	3225901	CC	TT
12	rs868688	1	3290667	TT	CC
13	rs10492939	1	3292731	AA	GG
14	rs2493268	1	3298358	TT	CC
15	rs871822	1	3302774	GG	TT
16	rs12024847	1	3310659	TT	CC
17	rs2821017	1	3510731	GG	AA
18	rs3765761	1	3620336	CC	TT
19	rs3765766	1	3624520	TT	CC
20	rs4233262	1	4136842	CC	TT
21	rs966321	1	4215064	GG	TT
22	rs964715	1	4216644	TT	CC
23	rs1390136	1	4241703	CC	TT
24	rs4654545	1	4425464	TT	CC
25	rs446529	1	4695274	CC	TT

This table shows that for the first SNP, rs9729550, I have CC while my sister has AA.

In conclusion, Promethease and the SNPedia visualization tool is helping me learn more about my SNP genotype results, complementing the information that I initially got from my Direct-to-consumer provider. Hopefully I will be able to do some additional research based on the results hereby obtained.

If you want to see my family’s genomes with Mike Cariaso’s tool you can find it here:

http://files.snpedia.org/reports/promethease_data/promethease_corpas_family_comparison_newfamily.html

Don’t forget to send me any exciting findings that you might encounter!

Where Am I?

You are currently browsing the Personal Genomes category at Manuel Corpas' Blog.

Follow

Get every new post delivered to your Inbox.