Data

One of Manuel Corpas’ Blog objective is to experiment with personal genome data, driven by a spirit of curiosity.

Members of the Corpas family have decided to carry out personal genetic tests, analyze them and make the data available to the public domain. We hereby have made our genomes available to the community in a variaty of ways a public domain license.

1. Why is this data useful?

1.1. Genetic Genealogy

Other sites that allow 23andMe data for free download belong to different unrelated individuals. This current effort is unique in that it provides the complete raw data for free download and reuse of blood related members of the same family. Special thanks to the Corpas Family for their enthusiastic support.

Squares and circles denote male and female respectively. Filled shapes represent those for which 23andMe genome data is available. Manuel Corpas is denoted with a red diamond.

Having a subject, parents and siblings is a more useful set of data, as it allows the ascertainment of how traits were inherited.

1.2. Homogeneous Ethnic Background

Our ancestors have lived for many generations in the region of Andalusia, Southern Spain. Despite the fact that this part of the world has experienced numerous migrations and movements of peoples, all genomes provided have 100% European descent (according to our 23andMe ancestry painting). Family based genetic association designs are ideal as they avoid the potential confounding effects of population stratification by using the parents and sibling as controls.

Place in the world where the Corpas Family has lived for generations.

1.3. Ease of access

All this data is available in raw format (bed file) assembly NCBI36, as a DAS source and has been incorporated into myKaryoView for easy navigation. Using myKaryoView any region, SNP, gene or chromosomal band can be search against and the genotype for each of the SNPs of the member families may be accessed simply by clicking in the appropriate bar in the graph. Any portion of this data may be also downloaded by the myKaryoView interface.

2. What kind of data is currently available?

Currently 23andMe bed format data files are available for download under a CC BY-SA 3.0 license. Genetic tests were performed at two different points in time, as a consequence making data from Manuel Corpas slightly different from the rest of his family. His genotype is 0.5+ million SNPs while the rest of his family’s contain ~1 million.

3. How to download this data?

3.1. Raw data

These links provide the data in a compressed format as it is downloaded originally from the 23andMe site:

3.2. Retrieval as a web service

The following url illustrates how to retrieve all SNP data for Manuel (‘son’) contained in chromosome 1 between coordinates 5900000 and 6000000.

http://www.ebi.ac.uk/das-srv/easydas/bernat/das/son/features?segment=1:5900000,6000000

For retrieving data from any other chromosome change the character after ‘segment=’ to 1 .. 22 or X or Y.

To choose any of the genomes available, in the above url change ‘son’ for ‘mom’, ‘dad’, ‘sister’ or ‘aunt’. Thus to retrieve 2 chromosome SNPs from dad in its first hundred thousand base pairs type:

http://www.ebi.ac.uk/das-srv/easydas/bernat/das/dad/features?segment=2:1,100000

This was done with the help of Bernat Gel, using his easyDAS tool.

3.3. Navigate and visualize genomes in context

All this data has also been incorporated into the default interface of myKaryoView. This tool provides an environment for easy navigation anywhere in the genome.

3.3.1. Go to the myKaryoView homepage and type in the search box any SNP id, gene name, chromosomal band or location and hit enter.

Data sources available by default in myKaryoView. All data sources whose description is “23andMe SNPs from member of family” correspond to the Corpas family genome data.

In this case the rs10993994 SNP is typed.

Once results are returned back, available SNPs for each of the family member are shown as little colored boxes. The queried SNP is the one exactly in the middle of the interval in the Zoom view.

myKaryoView results returned when searching for the rs10993994 SNP. Each track corresponds to a different data source. Each family member is represented by a different track.

One of the things to notice is that the Son track has fewer boxes (SNPs) than other family members. This is due to the fact that an earlier chip was used for Son with fewer SNPs. myKaryoView is also ideal for determining the density of SNPs for a given region in the chromosome.

3.3.2. You can click on any of the bars that appear in the graph. When clicking on a feature of doing so a popup window appears with the SNP’s id, position and genotype (denoted as ‘Type Id’).

Popup that appears when clicking on rs10993994 in the Son track. The genotype is denoted as “Type Id”, which in this case is “tt”.

3.3.3. In order to retrieve all of the SNP data contained in the Zoom view for Son, click on the caption for Son at the top of the Zoom view:

Son Caption

This caption indicates in parenthesis the number of SNPs available for this data source in the Zoom view.

3.3.4. Then the following popup window appears, which provides two links: ‘Display Annoations in Ensembl’ and ‘View Original Source’. Clicking in ‘View Original Source’ allows retrieval of all SNPs contained in that region for the selected data source, in this case Son.

Retrieved data linked from the “View Original Source” link in popup window that appears when clicking the Son caption link at the top of the Zoom view.

4. Public Domain Dedication

You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. We would be very grateful if you could report back to us anything interesting you may have found, any application or any use you may have given to this data.

5. Thanks

The Corpas family (forever supportive of my endeavors)

Rafael Jimenez (Senior Technical Officer)

Dr Darren Logan (Faculty, Wellcome Trust Sanger Institute)

Dr Anna Middleton (Ethics Researcher, Wellcome Trust Sanger Instute)

Dr Bernat Gel (Intern Student)

§ 10 Responses to Data

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.