Gene Wiki Google update

Inspired by a neat analysis of online fungi resources by Rod Page, I decided to do a similar comparison of Gene Wiki pages.

Here’s the take home message: in terms of online gene annotation resources, Gene Cards is the most common top-ranked resource, followed closely by the Gene Wiki / Wikipedia, with NCBI in a very distant third (note the log scale).

A more detailed analysis of the top three sites shows that the Gene Wiki / Wikipedia does slightly better for the most viewed genes, while Gene Cards does slightly better for the less commonly-viewed genes.

Here was the basic protocol. I first retrieved all Gene Wiki pages and parsed them for the official gene symbol. I then searched using the gene symbol at Google (using the search API) and recorded the site of the top hit. I then ranked the most common sites (top figure), or plotted the top three sites when the pages are ranked by the number of views between January 1 and June 30, 2009.

So the Gene Wiki has grown quite a bit since its inception ~2 years ago, but there’s still room to grow…


  1. Isn't this result slightly biased towards Gene Wiki? By taking the Gene Wiki names, I'm surprised WP doesn't come out on top. Ideally this analysis would use a third-party source of gene names that includes synonyms.

    I'm surprised HGNC doesn't appear at all on your plot.

    It's still a cool and interesting analysis of course.

  2. Hi Paul,

    You're right, but can't I be a little biased toward Gene Wiki? 😉

    But for clarification, we're not as biased as you might think. We don't google the Wikipedia page name, but we simply use the Wikipedia page to retrieve the HGNC symbol. So we are using a third-party search query.

    The slightly more biased part is that we're only searching for genes (and searching for all genes) in the Gene Wiki / Wikipedia.

    I do think it'd be interesting if we repeated the analysis based on gene name instead of gene symbol.

    Now that you mention it, the absence of HGNC surprises me too. Just double checked the output file to confirm. My interpretations is that while it's the authority, people and sites don't often link directly to it. More often they must get the HGNC symbol and name through other sources (like Gene Cards, Gene Wiki, NCBI…)


  3. Rather than write a blog post about Rod's blog post about our analysis on our blog based on his original analysis on his blog, I'll simply link to it here:

Leave a Reply

Your email address will not be published. Required fields are marked *