The pace of data generation is leaving far behind our ability to convert this data into usable knowledge. Even well funded biomedical databases find it increasingly difficult to keep up to speed. In order to tackle this problem, some databases have opted for increasing automation in the way data is deposited, reducing the time needed for interpreting results. The problem with this approach is that generated knowledge as a result is less accurate than manually annotated entries and of lower quality. Another potential solution has been to engage leading experts, creating a sort of consortium where they give some of their time to curate data entries that match their specialties. Unfortunately, engaging world experts in curating biomedical resources has not had a lot of success, with a few contributing a lot and many hardly ever dedicating any time to curation no matter how much they were fetched.
A new revolutionary idea has come from Alex Bateman‘s group to engage not just the community of experts but the whole of the Internet, using Wikipedia. One of his group’s databases, Rfam, which characterises RNA families, is now providing all of its annotation via Wikipedia. Wikipedia is already the leader reference resource for all kinds of information. It possesses the know-how and capability to mediate the curation of database entries as well as managing to have extremely resounding success in terms of gathering reasonably high quality knowledge.
After having a persuasive discussion with Alex, I decided to give it a try myself and add my very first entry to Wikipedia, which I thought it could potentially help the database I develop outsource its public/non-sensitive data annotation part.
I copied, edited and formatted parts of a non-sensitive entry (a Syndrome description) to Wikipedia. I learnt –contrary to what I expected- that as long as one has an account and no entry exists on the topic, a page can be added on the fly. So I added a page and started editing, copying and pasting.
It took me a bit of time to get used to some of the conventions and formatting tags used by Wikipedia but very early on I had help from Wikipedia ‘agents’. It really surprised me how quickly these agents picked up my entry and immediately made me know the criteria for making sure this Wikipedia entry achieves a high standard.
I learnt about important concepts in the Wikipedia context such as Notability and Conflicts of Interests. Apparently one cannot write about oneself for example, and personal opinions or articles are not accepted. So far this was OK for me although problems came when one of this agents pointed at some copywriting issues: I was trying to copy an entry of a website/database.
Blatant copy of public content from another website is considered a copyright violation unless a correct license is put in place and one ‘owns’ the data. In our case, the Creative Commons License, which is the one we hold, was not OK because although it lets public use of the information, it does not allow alteration. This means that people would not be able to edit my Wikipedia entry.
I must admit I felt intimidated at this point. Despite that, I was extremely impressed with the efficacy with which agents acted as well as how quickly they responded to my queries. I can understand why they have to be so tough so that they prevent abuse.
Overall I feel quite satisfied with what I have learnt in the process and I am extremely eager to keep exploring the use of Wikipedia for database curation. Of course this is just a try and our adopted solution for keeping up with current annotation may be something different in the end. However, it is worth a try.


















































Leave a reply to Kevin Karplus Cancel reply