I offer an idea, and a proposal…
What: We want to structure biological knowledge by annotating BioThings (genes, proteins, mutations, diseases, and drugs) in biomedical research articles. We want to comprehensively annotate the mentions of these BioThings in research articles, and also want to describe the nature of the relationships between these entities. Finally, we want to do it in roughly real-time (within one week of publication).
Why: The biomedical literature is massive. There are currently over 23 million articles indexed in PubMed, and it is growing at a rate of over one million new articles per year. Despite the explosive growth in the amount of biological knowledge being produced, the percentage of that knowledge that is systematically structured is very small. Similarly, the percentage of biological knowledge that any one person can retain in their brain is rapidly shrinking, so the lack of efficient methods for querying and computing on that knowledge hinders scientific research. Conversely, a knowledgebase that described the Network of BioThings would be valuable both for focused browsing as well as for large-scale data mining.
How: We envision employing a kitchen-sink approach. The approaches we employ might include (but not be limited to):
- Natural Language Processing
- Citizen science
- Microtask markets
- Professional biocuration
- Scientific publishing
- Open Innovation Challenges
Each of these approaches is valuable individually, but even more importantly, we want to explore how these approaches can be used synergistically. For example, we want to explore how microtask markets be used to improve NLP training, and how Citizen science be used to improve the efficiency of professional biocuration. For a challenge as massive as the Network of BioThings, the whole is clearly greater than the sum of its parts.
Who: Our research group does not have expertise in all the areas above. I suspect that no single group can sufficiently cover all those areas, and that assembling a consortium of experts is the most promising strategy for this challenge. Therefore, we’re reaching out anyone who thinks they have something to offer in pursuit of the Network of BioThings. Skill sets that we might need include (but aren’t limited to):
- Semantic Web
- Community resources
- Text mining
- Disease/patient advocacy groups
- Biological pathways
- Game design
- Biological domain experts
Where: We envision a virtual consortium of investigators that may span all corners of the globe. For the moment, we will use Google Groups as a virtual gathering place. The goal is to have a broad and expansive discussion of ideas about how to achieve the Network of BioThings. The expectation is that one (or more) focused proposals will emerge, but of course the scope and participants of these proposal(s) has yet to be defined. At the same time, any participant should be welcome and encouraged to use any of the ideas from this forum to craft their own proposal. To encourage open exchange between participants, the Google Group discussion will be initially closed to participants. Anyone is welcome to request an invitation to join,
When: My goal is to start this brainstorming process now with the intention of establishing collaborations and initiating quick-win projects. I’d hope that we’d have some tangible wins completed in early-to-mid 2014, and thereby positioning us for a competitive grant proposal in late 2014.
If you’re interested in helping to build a Network of BioThings and you think you’d have something to contribute, email me at asuscrippsedu to get access to the mailing list. Consider this your open invitation… EDIT 20140315: The Google Group to discuss the Network of BioThings is now public.