A Warning Sign for Biomedical Databases

Users of the highy popular OMIM database (On-Line Mendelian Inheritance in Human) [1] may have noticed that NCBI [2] is not providing further funds to sustain OMIM’s development. One of the reasons for halting the funding may have to do with curation work not deemed worthy of funds. Funding agencies might have thus started a trend to not willing to dedicate funds for curation of database entries.

The flip side of this is the nascent trend to outsource database annotation to the general public. Databases like Rfam [3] or Pfam [4], two popular RNA and protein family databases, have adopted the strategy of outsourcing their annotation to Wikipedia. Realizing that it is impossible to keep up with the literature, an attempt was made by Rfam to seed Wikipedia with database-specific information. They then developed a system to collect Wikipedia text from created entries periodically to repopulate back the corresponding RNA entry. The price they had to pay was losing control on what gets entered into the Wikipedia entry. However, benefits seem to outstrip this loss of control, including ready access to an army of casual annotators and a dramatically increased exposure of the database itself (Wikipedia consistently ranks top of the list for most RNA family searches in Google). This means that their chances of having up-to-date content is increased, as well as better awareness of the resource, justifying future cycles of funding.

Something that started as an experiment in Rfam seems to be spreading to other databases as they begin to assess how to address their annotation bottleneck. It seems that outsourcing annotation of Biomedical databases to Wikipedia is a solution worth considering as curation practices continue evolving to cope with current fund shortages. Generalized lack of funding for research and the establishment of community wiki-style annotation practices may mean that funding agencies may be ever more reluctant to provide funding for database curation. Perhaps this is the time to start rethinking future plans for those of us who care about biological databases and their contents. Is now the time ripe for embracing Wikipedia to the full?

[1] http://www.ncbi.nlm.nih.gov/omim

[2] http://www.ncbi.nlm.nih.gov/

[3] http://rfam.sanger.ac.uk/

[4] http://pfam.sanger.ac.uk/

[5] http://www.wikipedia.org/

Podcast also available on PocketCasts, SoundCloud, Spotify, Google Podcasts, Apple Podcasts, and RSS.

9 responses

rethinking biological complexity, google-style. « cistronic

June 23, 2011 at 7:46 pm

[…] Something that started with Rfam has now spread to other databases (well described in “A Warning Sign for Biomedical Databases” by Manuel Corpas). Similarly, crowd-sourced “secondary” databases started to add […]

LikeLike

Reply
Jason Stajich

June 16, 2011 at 6:33 am

See a prescient quote: http://qotd.me/q2010-05-20.html
but also see:
http://www.sciencemag.org/content/319/5870/1598.full

LikeLike

Reply
admin

May 29, 2011 at 6:14 pm

Another emblematic Biomedical database whose funding is being affected.

I just learnt that KEGG will be requesting payment from academics to download data from their ftp site. KEGG’s PI, Minoru Kanehisa, says that his current grant is not “sufficient to continue to hire [his] talented crew of KEGG curators and software developers.”

Here is his plea document for support:
http://www.kegg.jp/kegg/docs/plea.html

LikeLike

Reply
Will Spooner

May 27, 2011 at 12:00 pm

Every time I look at this problem I wish it could be solved technologically e.g. by linked data/semantic web. I blame the publishing industry for not doing better at making scientific publications machine readable rather than the funding agencies for stepping away from funding of long-term manual curation exercises.

LikeLike

Reply
Andres

May 26, 2011 at 8:02 am

I’m new to this subject and might be stepping out of line.
But @Kevin, what stops the curation experts to keep on peer reviewing the database? they will need to add less data and do a lot more checking of new data coming in.
@John I would expect that the same pharmaceuticals that use the data are interested in keeping it exact thus it seems that the market need would mean that pharmaceuticals (or what ever company, research institute, university needs it) would hire curators to peer review on their behalf and check for fraud from other companies. Though I do not think that falsifying data benefits anybody and I think companies know this. Plus since it’s wikipedia you can subscribe to certain areas where you could be an expert in.

If it works for other lines of research I don’t see why not for this. I find that this kind of information is better out in the open.

LikeLike

Reply
1. John Hancock
  
  May 26, 2011 at 2:58 pm
  
  @Andres
  Although things are changing, pharma don’t necessarily have an interest in providing useful information to the wider community; generally they will want to keep useful annotations in-house. In the UK, which I know best, universities will never fund curators out of their core funding, and it is difficult if not impossible to fund them from grant funding. Institutes may fund curators for data in their immediate area of interest but this is unlikely to cover the entire area.
  
  LikeLike
Kevin Karplus

May 25, 2011 at 2:23 pm

One big risk for biomedical databases is deliberate, but subtle fraud. There are huge amounts of money flowing around drug development, and scientific fraud is a much bigger problem in biomedical research than in other fields. Opening up annotation completely, without peer review or curation by experts opens up room for lots of fraud.

Databases a little further from the money pots are probably a bit safer from fraud. Still, there are many databases out there, and little incentive for people to add information to them. Without funding for curation, many databases will fade in value rapidly.

Funding agencies have always preferred creating new knowledge (and throwing it away) to maintaining databases. Manual curation is the most expensive part of most databases, and the price tag is too high for review panels that value hypothesis-driven wet-lab work above all.

Of course, there are a lot of databases out there that are not worth the expense of maintaining, since they are dominated by a closely related database. I don’t know enough about OMIM to know whether it falls into this class.

LikeLike

Reply
John Hancock

May 25, 2011 at 9:59 am

Although OMIM has its weaknesses its major strength has always been that the information in it is curated by experts. There are core databases in any field that researchers rely on to provide definitive information, and when this is hand curated they have more confidence in the content. I agree that the wiki-based models have the potential to provide similar levels of confidence (although many people are suspicious of wikimedia generally, often unfairly) but the trick is in motivating the community to carry out the curation. I’m thinking, for example, of an organism database like the Mouse Genome Database which employs dozens of curators and curates every gene in the mouse genome in principle. How does one motivate the entire population of researchers who use the mouse, many of whom have only a passing interest in the organism as a whole and may only be using the mouse as a way of studying a human gene, to enter useful information into such a database?

LikeLike

Reply
1. admin
  
  May 29, 2011 at 6:42 pm
  
  The argument that I put forward to encourage people to edit Wikipedia entries is the following. We, scientists and researchers alike, have the responsibility to make sure that whatever information is in Wikipedia is accurate.
  
  This is not just a question of being ‘nice’. If we are to spend tax payers money in our undertakings, unless we feed back in an accessible manner an accurate account of our findings, we are arguably wasting their money. Wikipedia as the ‘de facto’ top recipient of Google searches is inevitably the first place people will look at. Ultimately researchers will have to realise this or their funding will not be renewed.
  
  LikeLike