This is the fifth and final blog post in a series on our Gene Wiki renewal. More details below.

Crowdsourcing is all about motivating large groups of individuals to collaboratively achieve some shared vision. For example, the Wikipedia crowd primarily uses volunteerism as a motivation. Other recent initiatives motivate crowds of gamers using fun and enjoyment1.
mark2cureIn this aim, we will target a crowd of “patient-aligned individuals”. So many people have been personally impacted by serious diseases, either themselves personally or one of their loved ones. Our goal is to create a mechanism by which these individuals can directly contribute to biomedical research.

We believe that this community would enthusiastically welcome that opportunity to contribute. This point was made especially clearly by our recent Twitter interactions that showed how active and engaged patients have become in their own health care and research. This is especially true for the so-called “rare diseases” collectively affect over 25 million people in the United States alone for which drugs are often not available.

How do we want to empower non-scientists to impact biomedical research? Our application will address the challenge of annotating the massive biomedical literature. Consider that there are over one million new research articles published every year, roughly equivalent to one article every thirty seconds.

Keeping up with the literature is drinking from the proverbial fire hose, and every researcher I know struggles with that challenge. It is difficult to identify the small percentage of articles that is relevant to my research, much less to extract the knowledge that they contain and put it in context of past research. And time matters as well — the longer it takes for me to discover the latest relevant research, the longer it takes for me to incorporate those new findings into my own research.

We want to enable non-scientists to help us “annotate” the latest research articles immediately as they are published. Annotations can be separated into two distinct types. First, “entity recognition” refers to the process of highlighting all the biomedical concepts, most notably the diseases, genes, and drugs mentioned in an article. Systematically annotating the concepts in every article benefits scientists on a number of levels. Consider for example a chordoma researcher. That researcher could be notified when a new article on chordoma is published. Or when any article mentions chordoma with a drug. Or chordoma and the T gene.

The second step in the annotation process involves “relationship extraction” to define which of those concepts identified in step one are related to each other and how. For example, we want to annotate that the microRNA miR-31 is related to chordoma because of its downregulation in tumors, and rapamycin is related to chordoma as an “inhibitor of chordoma cell proliferation” and for its activity “reducing the growth of chordoma xenografts”. Aggregating all of these annotated relationships into a single database would result in a powerful resource for researchers for many downstream applications.
9-5-2013 10-59-44 AM
Importantly, we don’t think non-scientists need to understand the scientific meaning of these biomedical concepts and relationships (though we certainly don’t rule it out either). Rather, we think that we can train users to use their basic English language skills to learn to recognize disease, genes, and drugs (and to use online resources to confirm their guesses), and to use the rules of grammar to simply highlight the phrase that describes the interaction.

We are calling our application Mark2Cure. If you are interested in getting notified when we launch, please sign up to our email list at


This blog post is part of a series of entries on our NIH proposal to continue developing the Gene Wiki. The other posts are here:

Post #0: Introduction
Post #1: Gene Wiki progress report
Post #2: Aim 1: Diseases and drugs
Post #3: Aim 2: Outreach
Post #4: Aim 3: Centralized Model Organism Database
Post #5: Aim 4: Patient-aligned crowdsourcing (this post)


  1. []