Our second paper on the Gene Wiki
“>was published online
today in the journal Nucleic Acids Research. It will also be included in the 2010 Database issue in January. From the concluding paragraph of the introduction:

Here, we present an update describing the recent systematic improvements to the Gene Wiki. Moreover, we report on a retrospective analysis of Gene Wiki usage and editing. Finally, we offer some concluding remarks on general progress and challenges facing efforts to collaboratively engage the entire community of scientists.

I think there’s lots of good stuff in this paper. Read to the end and you find an interesting analysis relating the Gene Wiki to Manny Ramirez and the genetics of gray hair.

But I have to admit my favorite figure is Figure 1, because it outlines in very plain terms why community intelligence initiatives like the Gene Wiki (and BioGPS) are needed and valuable:

The top panel analyzes links from Entrez Gene to PubMed. It shows that while there are a few genes that are very well annotated in the biomedical literature, the vast majority of genes are virtually unstudied (and unannotated). The bottom panel shows an analysis of links in the opposite direction, from PubMed to Entrez Gene. Less than 4% of PubMed articles in recent years (and less than 1.5% overall) are linked to Entrez Gene. If you believe that greater than 4% of articles have relevance to human gene function, it says that the community’s curation efforts are not scaling with the rate of biomedical publishing.

The vast majority of our field’s gene annotation efforts (and gene annotation database development) is currently done by a relatively small population of scientists. I have great respect for their work, and believe strongly that there will always be a role for centralized curation authorities staffed by professional curators. But, I think it’s also unrealistic to expect even a thousand of these full-time professionals to keep abreast of all the knowledge of gene function that is currently being generated. This view is echoed by many in the curation community itself, as in this recent review article:

“Sooner or later, the research community will need to be involved in the annotation effort to scale up to the rate of data generation.”

The Gene Wiki and BioGPS of course target the Long Tail, enabling a huge population of scientists each to contribute a little bit. While these contributions are individually small, they are collectively large. These “community intelligence” initiatives are highly complementary to existing resources, and they result in tools which are useful for understanding gene function. (I’d also like to explore ways for the Gene Wiki to directly interact with the professional curators.)

I’d also be remiss if I didn’t also note the critical role online collaboration played in this effort. Of the seven coauthors on this paper, two I’ve met only once in real life, and two I’ve never met in person. We are spread over four cities, five organizations, and nine time zones. Initiating and executing this collaboration happened virtually entirely online, aided by the FriendFeed Life Scientists room and Molecular and Cellular Biology WikiProject at Wikipedia. It was an eye-opener in terms of how effective online collaboration can be done.