Manuel Corpas' Blog

Genomes, Web 2.0 and Bioethics

My Personal Exome Now Publicly Released

After 5 months of having performed the sequencing of my personal exome, I now make it available to the community for public use. I release it under a CC BY-SA 3.0 license, giving you permission to use this data in any way, as long as it provides attribution to the source and it is shared under a similar license.

What is an exome?

An exome is the ~1% of my genome that encodes for proteins.

Why do I release my personal exome?

When my family and myself made our genotypes available through the Internet, we immediately received results from researchers around the world who took our data for analysis and came back with interesting results. As a result of this, we have been able to learn much about ourselves. I have reported this in a previous entry on this blog entitled “Benefits for Publishing Family Genomes on the Internet“. I now follow the same principle: if I make my exome available for people to analyse it, I can expect that some researchers may come back with interesting results.

What data do I actually release?

I release my 4 FastQ files that were given to me by my sequencing provider. This is the same kind of information that 23andMe gives in their current exome analysis offer. This information basically consists of raw reads that need to be aligned to a reference assembly. Once aligned, interesting variation data can be inferred.

What do I ask in return?

Nothing. I do appeal though to the good will of potential users to report back to me anything interesting they might find.

How big are the files?

They are huge. On average they are about 0.6 Gb per file and I have 4 of these. That means that it can take several hours for each file to be downloaded. Be patient!

Where can I get them?

Here:

  1. File 1
  2. File 2
  3. File 3
  4. File 4

Read the rest of this entry »

Filed under: Genomics, Personal, Personal Genomes , , ,

Converting Genes and Genomic Features From NCBI36 to GRCh37

The Human Genome is a like map where features and genes are mapped to. As techniques improve, our fine-grained resolution for that map increases and new versions are released every few years. When a new coordinate reference map (or assembly) for the Human Genome is released, it produces lots of headaches for those who work in the field as it means that the locations of genes, chromosomal bands and other features like Single Nucleotide Polymorphisms (SNPs) or Copy Number Variation (CNVs) change.

In order to have the most up-to-date version for the Human Genome set of genes and features sometimes it is necessary to convert from one assembly to another. In the past I have written a tutorial on how to remap from NCBI36 to GRCh37 human assemblies using liftOver. In this tutorial I present a simple step-by-step guide for feature remapping using NCBI’s remapping tool.

Important:

Please make sure you know in advance the assembly to which your aberration data is currently mapped to. If by mistake you remap an aberration already in GRCh37 to GRCh37 you will get new coordinates for the region mapped to the wrong coordinates.

The NCBI provides a web facility to convert coordinates from one assembly into another. To convert coordinates using their genome remapping service do the following:

  1. Make sure that your data is in BED format,  e.g. “chr3            100000 999990 myId0000123” -> CNV aberration in NCBI36/hg18
  • Please note that each field is separated by a tab and each line by a character return. Please follow this strictly or the remapping tool may throw an error.
  • Add as many lines as aberrations you would like to remap
  1. Go to the NCBI Remap page:
  1. Select “Organism for source data” Homo Sapiens, “Source Assembly” NCBI36 (hg18) and “Target Assembly” GRCh37 (hg19)
  2. Please leave all “Remapping Options” (Minimum ratio of bases that must remap, etc) with default values
  3. Select for “Input format” BED, “Output format” Same as input
  4. Paste your aberration in the input box where it says “Paste data here” and hit submit at the bottom of the page
  5. Wait until results are returned
  6. To retrieve results download “Mapping Report”, which is in excel format or alternatively Mapping report Sample in the results page

Please note that your aberration may remap to more than one location. I recommend that you manually check the coordinates and select the most appropriate of the doubly remapped aberration in the new assembly. Please also note that your aberration may not remap because the region is partially or entirely deleted in the new assembly or split in GRCh37. In this case I recommend that you use another start or end point position, maybe use the start/end of alternative probes until you find a region where it maps.

Another possibility could be to look at the genes for the region in the old assembly and select a region in GRCh37 that includes the same genes as in NCBI36. Each of these solutions requires careful deliberation and may not be applicable to your particular case (e.g. genes in different chromosomes would not allow remapping based on genes).

Filed under: Bioinformatics, Genomics, Tutorials , , , ,

A Family Experience of Personal Genomics Paper Out

I have the pleasure to announce the publishing of the case study “A Family Experience of Personal Genomics” by the Journal of Genetic Counseling today. An accompanying Commentary Note written by ethicist Anna Middleton is also published. Both papers are open access. [Correction: these papers will be open access shortly.]

The case study paper is an invited contribution for the Journal of Genetic Counseling in a special issue on Direct-To-Consumer (DTC) genetic testing. This paper describes the journey I went through with my family when we all embarked into analysing our personal genomes via a DTC genetic testing company. I believe my experiences could be related to many other people in the world as they gain access to this technology. In the commentary Dr Middleton discusses the implications of the difficulties I went through when communicating my relatives their genomic information.

Filed under: Computational Bioethics, Genomics , , , , , , ,

New Year Resolution: Never Post The Same Announcement Again!

The iAnn project with its official website and services is now formally launched as of January 2012. iAnn is a collaborative environment for curation of scientific announcements. iAnn is a standard platform providing software, services and an in-house editor to annotate and disseminate announcement data into a network of external websites. iAnn’s modular viewer interface allows easy customization and integration of announcement data in external web applications.

Figure 1. Example of iAnn Announcement Service integrated in the British Society for Proteome Research Website.

iAnn increases access to announcements through its dissemination tools, which have been designed specifically to integrate posts across many different websites with minimal effort. iAnn allows reporting of events, courses or news to a central repository, which are then disseminated seamlessly to member scientific organizations or websites according to keywords, dates or geographical location. Forget about having to post your event or piece of news more than once for wide dissemination!

iAnn Community

Here is a list of organizations already benefiting from iAnn services. You can also be one of them:

How can one join iAnn?

Currently there are several ways in which an organization may join iAnn. They can join as 1) Sponsor, 2) Member or 3) Collaborator. These categories are based on their needs and chosen commitment to the platform.

  1. Sponsors usually have a priority for dissemination of the announcements and news. They are usually not involved in the curation process of announcements but contribute with funds to support iAnn’s centralized curation efforts.
  2. Members are interested in both curation and wide dissemination of their own announcements. Typical members include iAnn widgets in their websites to offer the communities they serve permanently up-to-date relevant announcement information. For example, the British Society for Proteome Research (BSPR), depicted in Figue 1, shows a list of all the anouncements BSPR is interested in displaying in its website.
  3. Collaborators are interested in posting announcements to the iAnn repository and helping us spread the word. In exchange we offer collaborators the ability to display iAnn announcements in their website for free.

Read the rest of this entry »

Filed under: iAnn, Marketing , , , , , , , , , ,

Scientific Announcements Don’t Get Noticed Where They Should

Wouldn’t it be nice if the event you are trying to promote needed to be posted only once? What if  there was a central repository for dissemination of announcements that was accessible and permanently up-to-date?  Wouldn’t it be great if your blog or website could show relevant professional announcements without having to enter them?

Unfortunately, people around the world are still trapped in the paper-based office paradigm when wanting to disseminate announcement information. Again and again they post their announcement to different places knowing that it will only reach a partial share of all potentially interested readers. They add data and clog online databases as no centralized repository is available for posting or getting information. Despite the great number of hours of work lost by millions of people trying to post, scientific organizations have been extremely slow to embrace community-shared announcement curation.

We (Rafael Jimenez and I) are promoting the creation of a community of organizations and people to lead iAnn, a centralized collaboration platform that coordinates curation efforts among scientific organizations. iAnn increases access to announcements through its dissemination tools, which have been designed specifically to integrate posts across many different websites with minimal effort. iAnn allows you to post your event, course, piece of news only once to a central repository, which is then disseminated seamlessly to relevant scientific organizations or websites according to keywords, dates or geographical location.

If you think iAnn is of interest to you please contact me (see contact information on the right) or wait for future developments that are about to come in Manuel Corpas’ Blog. Currently we are in a development phase for the project and would like to hear from potential users or scientific organizations if they have any thoughts or suggestions on the matter. Our aim is to change the way anyone posts and finds relevant information about any given professional field. iAnn promises to help many users keep up-to-date with relevant announcements more effortlessly. Perhaps from now on websites will be better able to have most of the events, courses, seminars, news, etc. that users would expect to find in them.

Filed under: Databases, iAnn, News, Technology , , , , ,

What’s the Distribution of Oxytocin Alleles in the General Population?

In my previous post I commented on my family’s alleles for the rs53576 SNP of the oxytocin receptor (OXTR) gene. A GG allele seems to be associated to a more pro-social character. A follow up question would be “what would its distribution be for the general population”. Luckily, a few colleagues of mine and myself have compiled a database of n=52 23andMe genotypes from the public domain. While the number of individuals contained in the database is small and of predominantly European ethnic background, I still can get an approximate view of what the frequency would be for these alleles. I found the following distribution:

  • AA: 6 (11.5%)
  • AG or GA: 18 (34.6%)
  • GG: 28 (53.8%)

If I was to speculate from this finding and assuming that the GG allele association is true, it would seem that being pro-social is quite dominant among Europeans. In fact when I look at the distribution of rs53576 alleles per populations in the 1000 genomes project, the above distribution looks quite similar to the proportions shown for the European (CEU) pie chart:

Frequencies of rs53576 in the 1000 genomes project. CEU: European; CHB+JPT: Han Chinese and Japanese; YRI: Yoruban (from Nigeria). More yellow apparently means more social.

Obviously the samples quite are small to make a final conclusion but I let readers judge by themselves what these results might mean.

I would like to thank Karyn Megy as she gave me the idea of querying the public domain 23andMe database.

Filed under: Personal Genomes , , , ,

My rs53576 SNP Genotype Indicates I Am Pro-social — Phew.

I just came across the paper from Kogan, A., et al. (2011) in PNAS that states that “individuals who are homozygous for the G allele of the rs53576 SNP of the oxytocin receptor (OXTR) gene tend to be more prosocial than carriers of the A allele.” Wanting to determine my genomic horoscope prediction of the month, I decided to check my allele status as well as the ones for other members of my family. Luckily, this SNP is among the ones that 23andMe analyzes in versions 2 and 3 (my chip version was 2 and rest of my family was 3).

To my pleasant surprise, all of my family are GG, except my aunt who is AG.

Family genotype for rs53576 SNP using myKaryoView. Clicking on graph features popups are released with type_id (genotype) information. Grouped popups include mine, sister, dad, mum. Independent, aunt.

In a post about the Kogan et al paper, Suzanne Elvidge writes that oxytocin, also known as the cuddle hormone, makes us feel good. It’s released during sexual intercourse, pair bonding and breastfeeding, and our levels (and the dog’s levels) rise when we stroke our pets. The oxytocin gene may also make us more optimistic. Differences in our responses to oxytocin seem to affect how empathic we are – so if you are a nice person, it might be (at least a little bit) down to your oxytocin gene.

Maybe my aunt’s temperament may not be just a consequence of her having red hair.

Tita, te quiero!

Filed under: Personal Genomes , ,

Beware of Gene Names in Excel

For the past few days I have been trying to compile the list of gene names that is the most complete possible. To start with, I was given an initial list of genes in an excel file that was taken from the HUGO Gene Nomenclature Committee (HGNC). Unfortunately, the gene names were pasted from the original source (HGNC) to an Excel spreadsheet without modifying the expected format of the column cells. This led to Excel trying to “help” with the formatting of the value inserted, changing those gene names that are similar to dates to an actual date. In the bioinformatics field, misnaming a gene can lead to disastrous consequences such as misdiagnosis of a causal gene in a clinical setting. Thus:

Beware of pasting gene names in an Excel spreadsheet with a default format, as these may be changed into dates.

From my current list of 19,026 genes that I have compiled as of now, here are the names of the genes that have been automatically changed by Excel into dates. In the table below, the first column denotes the date the gene name is changed to, the middle column the Ensembl ID of the gene and the right column the actual name that was changed by Excel into a date.

Sep-01    ENSG00000180096        SEPT1    
Sep-02    ENSG00000168385        SEPT2
Sep-03    ENSG00000100167        SEPT3
Sep-04    ENSG00000108387        SEPT4
Sep-05    ENSG00000184702        SEPT5
Sep-06    ENSG00000125354        SEPT6
Sep-07    ENSG00000122545        SEPT7
Sep-08    ENSG00000164402        SEPT8
Sep-09    ENSG00000184640        SEPT9
Sep-10    ENSG00000186522        SEPT10
Sep-11    ENSG00000138758        SEPT11
Sep-12    ENSG00000140623        SEPT12
Sep-14    ENSG00000154997        SEPT14

Mar-01    ENSG00000145416        MARCH1
Mar-02    ENSG00000099785        MARCH2
Mar-03    ENSG00000173926        MARCH3
Mar-04    ENSG00000144583        MARCH4
Mar-05    ENSG00000198060        MARCH5
Mar-06    ENSG00000145495        MARCH6
Mar-07    ENSG00000136536        MARCH7
Mar-08    ENSG00000165406        MARCH8
Mar-09    ENSG00000139266        MARCH9
Mar-10    ENSG00000173838        MARCH10
Mar-11    ENSG00000183654        MARCH11

Dec-01    ENSG00000173077        DEC1

 

Filed under: Bioinformatics, Tutorials, Genomics , ,

myKaryoView Paper Out

As of October 26th 2011, a paper about the myKaryoView tool has been published in PLoS One. myKaryoView is a genome browser specifically designed for visualization of Direct-to-Consumer (DTC) personal genetic data. We look forward to receiving feedback from users visualizing their own personal genomes and developers willing to extend further the code or simply make use of myKaryoView in a different context.

The paper is freely available and open access.

Citation: Jimenez RC, Salazar GA, Gel B, Dopazo J, Mulder N, et al. (2011) myKaryoView: A Light-Weight Client for Visualization of Genomic Data. PLoS ONE 6(10): e26345. doi:10.1371/journal.pone.0026345

Filed under: Bioinformatics, Genomics, Personal Genomes , , , , ,

[Guest Post] Making Genetic Testing Results Available for the Public

Bastian Greshake is cofounder of openSNP, a recently launched open source project whose aim is to collect the genotypes from Direct-to-Consumer (DTC) genetic testing. Customers who wish to donate their genetic data for further analysis can have here a place where they can do so. openSNP does also allow the possibility of addition of phenotypes plus some user-friendly search interfaces. Users who donate their data to openSNP automatically make their personal data available in the public domain. For a project that was started in June 2011 by two Masters students, the work carried out so far shows great promise. Here readers have a chance to learn how they have done it. [Manuel]

Manuel was kind enough to allow me to write a small blog post about the openSNP project, which I launched together with a friend a couple of weeks ago. And instead of boring you with too much technical details I want to use this space to give you a small history on how we got the idea for the project and why we pursued it. Personally I’ve been fascinated with Direct-to-Customer (DTC) genetic testing for quite a while, namely since 23andMe started, and always wanted to get my hands on my personal data. During one of their sales in April of this year they dropped the prices to a level that made it affordable even to students like myself. So I took this opportunity and about a month later I got back my results. I immediately started to play around with the raw data that was delivered.

openSNP Welcome Page

Quite similar to Manuel’s efforts, trying to learn about his risks and how they affect his family, I wondered what kind of information I could find about my parents by analyzing my results. So I started to look for my homozygous SNPs and for the heterozygous ones that showed higher or lower risk of a disease. Next I looked around for some more DTC-data on the web, to have something more to play with and found a list of 23andMe genotypes on SNPedia that helped me a lot. I found, however, somehow disappointing that this list did include some broken links. They had to be checked by hand in regular intervals to find the latest data and had no further information about the people that uploaded the data. And who knows how many people would have already published their results not listed there?

Because of this I started to work on a small repository that could host DTC raw data in June of this year and found some friends that were willing to help me with the project. While we worked on the project we had some more ideas: we added the basic functionality to add phenotypic information, deciding that it would be great if people had access to the latest literature on genetic risk factors that they have been tested for. So we started to add information from SNPedia, the Public Library of Science as well as the crowd-sourced database of Mendeley to enhance the user experience.

About a month later, at the end of September we published the first version of our application. Our vision for openSNP – “Crowdsourcing Genome Wide Association Studies” – may be a bit over the top right now, as the number of users is still quite small. But that is basically what we want to achieve: to create a public resource of genetic information that ultimately should be used to create new knowledge. We are really happy with how things have turned out so far. In less than four weeks over 30 people have been willing to upload their testing results, fixed some bugs and started to work on new features that hopefully will be implemented in the near future.

Direct-To-Consumer genetic testing is here to stay. And with the prospect of exome and whole genome sequencing already becoming available as services, there is a great need for platforms that make results and knowledge available – for the interested public as well as for scientists. Hopefully we can help to make this happen.

Filed under: Guest Post , , , , , , ,

Follow via Twitter

Disclaimer

Any views expressed here are the author's alone and do not necessarily form part of the official positions of his employer.

Creative Commons License
Unless otherwise stated, this work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Recently Tweeted

Follow

Get every new post delivered to your Inbox.

Join 205 other followers