December 16, 2010 § Leave a Comment
Following on the release of a Nature article on the rise of genome bloggers, in which Manuel Corpas’ Blog is linked, I would like to take this opportunity to announce the release of myKaryoView v2, an open source visualization software for personal genomics. Combining Rafael Jimenez’s and my own efforts, we have significantly augmented myKaryoView’s capabilities to allow users to visualize their personal genomes.
Visualization of one’s own personal genome is done via Bernat Gel’s easyDAS tool. This tool converts files with biological annotations into a DAS source. DAS sources can be thought of as tracks in a genome browser. The beauty of DAS is that it does not require any data to be stored locally and, as long as the reference coordinates are the same, any kind of biological features can be easily integrated.
Exploring My Own Genome With myKaryoView
23andMe analyses results report that I have a 28.1% risk of developing prostate cancer as opposed to a 17.8% average risk in males. This risk is calculated analysing the genotypes of 12 SNPs. The SNP marker rs10993994 shows the greatest risk among the 12 reported markers, a 1.3 increased odds. This SNP is located in 10q11, near the MSMB gene and the found allele (T) has been shown to affect its expression levels, decreasing its cancer suppressor function .
Having no history of prostate cancer in close relatives, I wanted to find more information about this SNP in order to confirm results. My whole genome profile, containing > 570,000 SNPs, was downloaded from 23andMe and a DAS source was created using easyDAS. The resulting data source was held privately in my newly created easyDAS account. Once easyDAS creates a new DAS source, the data is available through a URL. I pasted the URL for my genome data into the myKaryoView interface, selecting all accompanying tracks to be shown in its zoom view.
I typed the ‘MSMB’ gene myKaryoView’s query search box and once results were returned, I zoomed out to have a better overview of my 10q11 chromosome region, shown below.
My genomic profile is the bottom track with SNPs in green. The top track in purple corresponds to genes involved in mendelian inheritance diseases (taken from OMIM), in red all existing genes, blue and green normal CNV regions and in yellow somatic mutations found in cancer (from the COSMIC database). I clicked on the gene and SNPs feature bars to find further information. Clicking on the MSMB gene feature, I found that this gene’s start position is 51219559, only 57 bp after the rs10993994 SNP position. The track with yellow features (COSMIC) also contained four reported mutations for MSMB (MSMB:ENST00000358559), indicative of the involvement of this gene in cancer, but all of them within the genes exons, i.e. not outside the gene. Following MSMB’s link to OMIM revealed also its implication in prostate cancer.
By seeing all these data sources in myKaryoView I feel more confident with the validity of 23andMe’s reported risk. It is true that all the sources visualized in myKaryoView can be found if searched for in the Internet. The merit of this tool is, I think, that it provides a one stop shop for a first step in analyzing original data sources for one’s personal genomic results.
myKaryoView is a web tool for visualization of genomic data specifically designed for direct-to-consumer genomic tests that uses publicly available data distributed throughout the Internet. It does not require data to be locally held and it is capable of rendering any feature as long as it conforms to a standard protocol named DAS. Configuration and addition of sources in myKaryoView can be done through the interface. myKaryoView should be considered a prototype and not a finalized tool. Here offer a proof of principle of myKaryoView’s ability to display personal genomics data with 23andMe genome data sources. Prior to publication, please acknowledge Rafael Jimenez and Manuel Corpas if using myKaryoView.
 Proc Natl Acad Sci U S A. 2009 May 12;106(19):7933-8. Epub 2009 Apr 21.
September 1, 2010 § Leave a Comment
Following my previous post on the First Publicly Available Genome Via DAS I would like to present an open source software that Rafael Jimenez and myself have developed for visualization of genomic data. Here we have it configured to display 23andMe data as a test case. We call it myKaryoView and it is available for free use and download. Its website is located at the following address:
myKaryoView works in most contemporary browsers without lengthy installations and uses publicly available data distributed throughout the Internet via DAS. This means that there is no need to hold the data locally and that it is capable of visualizing any data as long as it is available via DAS. In order to visualize 23andMe data, myKaryoView requires the set up of a DAS source, which currently limits myKaryoView’s usage to those familiar with this technology. However, configuration and addition of sources are extremely simple and the amount of data able to display is limited only to the time of request completion and data rendering.
Here we show myKaryoView to display personal genomics data with a dummy 23andMe genome data source. This source is based on real 23andMe results data from my own genome, randomly modified in a manner that is irrecognizably different.
The myKaryoView website shows an implementation that allows search of genome data via gene name or genome coordinates. For example, type in the search box 1:2000000,6000000 and hit “Submit Query”.
The figure above shows results of that query, with two tracks containing the source from 23andMe with dummy data plus genes for a subchromosomal region in chromosome 1, Start: 2000000, End: 60000000. Gene names and SNP data and are shown in red and blue respectively. Different color shades indicate the density of annotation for any given point. If the “Gene Names” data track name is clicked, a popup window appears with a link “Display Original Data Source” that allows the download of the raw data from its DAS source. Any feature can be clicked for retrieval of specific information contained in the DAS source. Here a blue SNP mark is clicked and a popup window appears describing the selected SNP and a link to its corresponding dbSNP entry.
A simple manual explaining how to install and configure myKaryoView to show different data sources is provided from the website. myKaryoView is still in beta testing and any feedback is welcome. We have some plans for the near future for myKaryoView, which we will reveal in due time. Meanwhile I hope you find it interesting and useful.
By the way, the claim that this is the First Open Source Visualization for 23andMe data is, of course, arguable.
August 26, 2010 § 3 Comments
You may have heard stories about some well known people to have released their genome for public use. I would like to convince you that now you don’t have to have a lot of money or being a public figure in order to do that. Companies like 23andMe and Navigenics provide the ability to get one’s genome tested for not a lot of money and get the results via a password protected website. The problem is that our current understanding of what these results mean are rather limited on their own. Thus having open collaboration platforms for citizen science using genomic data may be a step forward in helping understand one’s genetic testing results. Initiatives like DIYgenomics are already working on this concept.
You may wonder why making one’s genome released is useful. The answer is, in practical terms it is not. However, the concept of being able to do that I consider it to be a very interesting one. After all, one’s genome data on its own is hardly informative, but when compared with information like known genes, pathways or even other people’s genomes, it becomes much more interesting and opens up the possibility for real discoveries.
With this post I hope to prove that genomes can now be put on the web in a standard format like the Distributed Annotated System (DAS) where people can share and integrate them with other public data sources mappable to genome coordinates. DAS is an environment that is open source, decentralized and unregulated. So what is different here from what is being done already? Why is this significant? I can think of at least three reasons. 1) Flexibility: pretty much any genome annotation can be put up; 2) Integration capabilities: anything can be combined with anything else as long as they share the same coordinates system and 3) Data outsourcing: data is stored and maintained by DAS source owners elsewhere. Here is my story:
Last year I decided to get a 23andMe kit to have my genome analyzed. After results were delivered, I decided to download the data in raw format, consisting of >0.5M SNPs (single nucleotide polymorphisms) mapped to the NCBI36 genome assembly.
I wanted to experiment with this data from a bioinformatics point of view, so I decided to put my “genome” on the web for public access. Well almost. I did not put up my real genome, I created a randomly shuffled version of it (i.e. it does not resemble any recognizable trace to the real data). I put up this unreal data to make a point of principle.
Anyone in the world can thus access my randomly shuffled genome using an URL like this:
where after the token “segment=” in the above URL a chromosome type is specified [1-22, X, Y], followed by a colon, followed by the start and end position, separated by comma. Try the above URL with different chromosome number and coordinates and see what results you get!
In the above figure you see different columns, denoting the SNP id, start and end positions, the genotype under the “Notes” heading and a link to the SNP’s corresponding entry in dbSNP.
Now that this genome is in a standard format, it can easily be integrated with any other publicly available data in DAS. As of this writing (26th August 2010) there are 139 data sources available in the DAS registry mapped to Human Genome coordinates. I may not be interested in them all, but certainly this is one of the greatest repositories of genomic data in just one shop. Leading providers of publicly available genomic DAS sources include Ensembl, the Database of Genomic Variants and ENCODE. Potential permutations of this data provides a range of possibilities for interrogation of biological hypotheses that is probably unparalled.
Now this shuffled genome is available for public use via a DAS web service. It will probably not be the last one to be put up and soon real 23andMe genomes will follow.