Integrating Wikidata and other linked data sources – Federated SPARQL queries

Posted by on Jul 13, 2017 in semantic web, SPARQL, Wikidata | 0 comments

This blog is about running federated SPARQL queries on Wikidata.  A federated query is a special type of SPARQL query that runs on more then one SPARQL endpoint. It allows access to multiple linked data resources in a single query. Below is a template of a federated query.

Structure of a federated query. It contains query patterns for both the local endpoint (green box) and a remote endpoint (blue box). The address of the remote SPARQL endpoint is expressed with the SERVICE keyword. (sourcecode)

The WikiData Query Service (WDQS) now supports federated SPARQL queries on a limited number of endpoints. Remote SPARQL endpoints can be added to the list of supports endpoints by nomination.

Wikidata aims to provide the sum of all knowledge to the world at large. To fulfil this,  it needs to be a hub to the total knowledge space. After all, different technical, legal or social limitations exist to include all in a single repository. Through federation or distributed querying, local and remote data can be combined.  Here we explore three ways to apply this type of querying on Wikidata content. 


From Wikidata to an external SPARQL endpoint (Wikipathways)

The following query applies federation to integrate between a pathway from Wikipathways and Wikidata. Wikidata contains items on human pathways from Wikipathways. Metabolic interactions are not yet captured in Wikidata. Through federation, these metabolic interaction can be obtained. In reverse direction is it possible to obtain properties of pathway elements from Wikidata. Take for example the “Sudden Infant Death Syndrome (SIDS) Susceptibility Pathways (Homo sapiens)” pathway. It contains various biological interactions. We could now, using federated queries, get properties such as the mass of a given pathway element. 

Sudden Infant Death Syndrome (SIDS) Susceptibility Pathways (Homo sapiens)” pathway (

Using this query as input, with a federated query it is possible to enrich this pathway with properties not captured in Wikipathways. One example would be the following query that takes interactions from the above pathway and combines that with the mass of the individual pathway parts.

You can run this query here or watch it run on youtube

From a remote SPARQL endpoint to Wikidata

If a remote SPARQL endpoint is not (yet) eligible to be used in the WDQS, it is possible to run the query from the external endpoint. That is, if the external endpoint accepts submitting federated SPARQL queries. The SPARQL endpoint of UniProt is a nice example.  UniProt includes much more properties for proteins than currently captured in Wikidata. Properties that due to the more restrictive nature of the applicable license can’t be included in Wikidata. The following federated SPARQL endpoint runs on the SPARQL endpoint of UniProt. It selects all human UniProt entries with a sequence variants that leads to a loss of function, and also physically interacts with a drug used as an enzyme inhibitor.

Integrating Wikidata content with data from UniProt using federated query submitted at

Try it…


From a local SPARQL endpoint to Wikidata (Data from a database that will never go into WD)

Another interesting use case where federation can be quite handy, is in the context of local data. Epidemiological data on for example Zika outbreaks can contain large set of measurements spread over multiple time frames. Loading those measurements into Wikidata, can be be difficult especially if the outbreak is ongoing resulting in new data arriving in rapid intervals. One solution to enable integration with that data and other resources like Wikidata is running distributed queries from a local SPARQL endpoint. The local SPARQL endpoint has two roles, first it collects the measurements from the different Zika studies, secondly federated queries can be executed to enrich these measurements with knowledge from Wikidata.  We have create an example script that takes data on Zika outbreaks, converts that to linked data as RDF, which is then loaded in a local SPARQL endpoint. This prototype is available on github.

This approach also works when one would like to integrate sensitive data (clinical patient data) with external WIkidata knowledge if the local endpoint is maintained from within a secure infrastructure which allows getting data from outside the infrastructure, but prevents exports.  


Indeed SPARQL has a steep learning curve.

Although writing SPARQL queries can be perceived as being quite intimidating, the feature to run federated queries on Wikidata content is very valuable and needed to make Wikidata the central hub of research data in the life sciences. The effort to learn SPARQL is worth it. Fortunately, Wikidata provides a large set of examples. Either for inspiration or as learning material.
There are also quite some developments to make writing queries easier.

There is also a R package that integrates SPARQL with R scripts, where the example queries from Wikidata can be scraped, which means that one could use the advantages SPARQL offers without writing a SPARQL query, simply by building on what others already made.

Finally, there is always the help of the Twitter space. Many are quite eager to share SPARQL knowledge.

The Gene Wiki project: Looking to the future v.2017

Posted by on Jul 7, 2017 in Gene Wiki, GeneWikiRenewal, proposal, Wikidata | 0 comments

The Gene Wiki project has been generously funded by the National Institutes for General Medical Sciences (NIGMS) since 2009. As the second funding period is wrapping up early next year, it was time to once again look forward and think about our vision for the next 4-5 years. Posted below is what we came up with, submitted earlier this week to the NIH as a competing renewal proposal with Lynn Schriml and Kristian Andersen as co-Investigators. Fingers crossed!

Also a fine time to recognize that this proposal resulted from the direct and indirect contributions of so many people — postdocs, grad students, staff, past and current collaborators, Wikidata and Wikipedia communities, etc etc — far too many to name individually here. For a mostly comprehensive list, please see our grant-related publications.

Science Game Lab: tool for the unification of biomedical games with a purpose

Posted by on Jun 16, 2017 in games, genegames, gwaps, Science Game Lab, SGL, sulab | 0 comments

Scripps team: Benjamin M. Good, Ginger Tsueng, Andrew I Su
Playmatics Team: Sarah Santini, Margaret Wallace, Nicholas Fortugno, John Szeder, Patrick Mooney, 
With helpful ideas from: Jerome Waldispuhl, Melanie Stegman
Games with a purpose and other kinds of citizen science initiatives demonstrate great potential for advancing biomedical science and improving STEM education.  Articles documenting the success of projects such as and Eyewire in high impact journals have raised wide interest in new applications of the distributed human intelligence that these systems have tapped into.  However, the path from a good idea to a successful citizen science game remains highly challenging.  Apart from the scientific difficulties of identifying suitable problems and appropriate human-powered solutions, the games still need to be created, need to be fun, and need to reach a large audience that remain engaged for the long-term.  Here, we describe Science Game Lab (SGL) (, a platform for bootstrapping the production, facilitating the publication, and boosting both the fun and the value of the user experience for scientific games with a purpose.  
Ever since the project famously demonstrated that teams of human game players could often outperform supercomputers at the challenging problem of 3d protein structure prediction, so-called ‘games with a purpose’ have seen increasing attention from the biomedical research community.  A few other games in this genre include: Phylo for multiple sequence alignment, EteRNA for RNA structure design, Eyewire for mapping neural connectivity, The Cure for breast cancer prognosis prediction, Dizeez for gene annotation, and MalariaSpot for image analysis.  Apart from tapping into human intelligence at scale, these efforts have also produced valuable educational opportunities.  Many of these games are now used to introduce their underlying concepts in classroom settings where games in all forms are increasingly working their way into curriculums.  Concomitant with the rise of these ‘serious games’, citizen science efforts such as the Zooniverse and Mark2Cure have sought similar aims but have packaged their work as volunteer tasks, analogous to unpaid crowdsourcing tasks, rather than as elements of games.  
Many of these initiatives have succeeded in independently addressing challenging technical problems through human computation, improving science education, and generally raising scientific awareness.  However, with so much interest from the scientific community and a booming ecosystem of game developers, there are actually relatively few of these games in operation now.  Recognizing the opportunity, various groups have attempted to push the area forward through new funding opportunities and through various ‘game jams’ such as the one that produced the game ‘genes in space’ for use in analyzing microarray data in cancer.  Here, we take a different approach towards expanding the ecosystem of games with a scientific purpose.  Rather than attempting to seed the genesis of specific new game-changing games, we hope to lower the barrier to entry for new games and related citizen science tasks to generally promote the development of the entire field.  With this high-level aim in mind, we developed Science Game Lab (SGL) to make it easier for developers to create successful scientific games or game-like learning and volunteer experiences.  Specifically, SGL is intended to address the challenges of recruiting players and volunteers, keeping them engaged for the long term, and reducing the development costs associated with creating a scientific gaming experience.
The Science Game Lab Web application
SGL is a unique, open-source portal supporting the integration of games and volunteer experiences meant to advance science and science education (  Unlike other related sites that act more like volunteer management and/or project directory services, such as SciStarter and Science Game Center, SGL is not simply a listing of related websites.  Rather, it is an attempt to create a user experience that takes place directly within the SGL context yet still incorporates content from third parties.  The system is largely inspired by game industry portals such as Kongregate that enable developers to incorporate their games directly into a unified metagame experience .
Players can use the portal to find and play games with their achievements within the games tracked on site-wide high score lists and achievement boards (Figure 1).  Players can earn the SGL points that drive these leaderboards for actions taken in different games.  In this way, SGL provides developers with access to a metagame that can be used to encourage players in addition to the incentives offered within individual games (Figure 2).  This metagame can also be used by the system administrators to help direct the player community’s attention to particular games or particular tasks within games.  For example, actions taken on new games might earn more points than actions taken on more established games as a way to ‘spread the wealth’ generated by successful games.    

Figure 1.  SGL home page demonstrating site-wide high score list, game listing, and links to achievements, help, and user profile information.
Figure 2.  Badges displayed on user’s profile page.  Available badges not yet achieved are greyed out.
 Developers interact with SGL by incorporating a small javascript library into their application and using the SGL ‘developer dashboard’ to pair up events in their game with points, badges and quests managed by the SGL server.  At this time, SGL only supports games that operate online as Web applications.  The games are hosted by the developers and rendered in the SGL context within an iframe.  The SGL iframe provides a ‘heads up display’ that provides real time feedback to game players with respect to events sent back to the SGL server such as earning points, gathering badges, or progressing through the stages of a quest (Figure 3).  This display provides developers with the ability to add game mechanics to sites that are not overtly games.  For example, Wikipathways incorporated a pathway editing tutorial into SGL, using the heads up display to reward users with SGL points and badges for completing various stages of the tutorial.   The tutorial also took advantage of the SGL quest-building tool (Figure 4).  Games are submitted by developers for approval by SGL administrators.  Once approved, the games appear in the public view and can be accessed by any player.  

Figure 3.  The heads up display provided by the SGL iframe.  Shows events captured by the API and provides users with immediate feedback.   

Figure 4.  Tasks in SGL can be grouped into quests.  The figure shows a particular user’s progress through various quests available within the system.

If a critical initial mass of effective games can be integrated, SGL could strongly benefit new developers by providing immediate access to a large player population.  Site-level status, identity and community features can help with the even greater challenge of long-term player engagement, a noted problem in the field.  Within the context of science-related gaming, such status icons might eventually be used as practically useful, real-world marks of achievement inline with the notion of ‘Open Badges.  As demonstrated by the Wikipathways tutorial application, SGL can be used to replace the need for developers to host their own login systems, user tracking databases, and reward systems – all of which can be accomplished using the SGL developer tools. Citizen scientists are not homogenous in their motivations. Designing to be inclusive of gamers and non-gamers can be challenging. By offering an alternative means of experiencing a web-based citizen science application, SGL allows developers to cater to both their gaming and non-gaming contributor audience. Together, these features unite to raise the overall potential for growth within the world of citizen science and scientific gaming.  
Future directions
SGL is currently functional, but so far has attracted only a small number of developers willing to integrate their content into the portal.  Future work would need to address the challenge of raising the perceived value of integration with the site while lowering the perceived difficulty.  Looking forward, key challenges for the future of SGL include better support for:
  • games meant for mobile devices
  • development of quests that span multiple games
  • teachers to build SGL-focused lesson plans and track student progress
  • creating new ‘SGL-native’ games
  • integration with external authentication systems
None of these are insurmountable challenges, but they all require significant continued investment in software development.  As an open source project, we encourage contributions from anyone that shares in our vision of spreading and doing science through the grand unifying principle of fun.

Building communities of knowledge with Wikidata

Posted by on Jun 16, 2017 in crowdsourcing, Gene Wiki, semantic wikipedia, sulab, wiki, Wikidata, wikipedia | 0 comments

As the Wikimedia Movement works to define its strategy for the next fifteen years, it is worthwhile to consider how its recent product Wikidata may fit into that strategy.  As its homepage states,
Wikidata is a free and open knowledge base that can be read and edited by both humans and machines.”
Wikidata is a particular kind of database designed to capture statements about items in the world with references that support those statements.  Because Wikidata is a database, its contents are meant to be viewed in the context of software that retrieve the data through queries and then renders the data to meet the needs of a user in a certain context.  The same data can thus be viewed on Wikidata-specific pages such as and in the infoboxes of Wikipedia articles such as  Importantly, Wikidata content can also be used in applications outside of the Wikimedia family such as   
Examples of Wikidata use now include:
The molecular biology community (and in particular the Gene Wiki group) has embraced Wikidata as a global platform for knowledge integration and distribution.  To help envision how Wikidata may fit into the strategic vision of the WMF movement, it is worth taking a look at how and why this particular community is using Wikidata.  
History of the Gene Wiki initiative
The sequencing of the human genome at the beginning of this century and the consequent rush of data and new technology for producing even more data fundamentally changed how research in biology is conducted.  Before the year 2000, research typically proceeded with a single gene focus.  A typical PhD thesis would entail the analysis of the genetics or function of one gene or protein at a time.  A few years after the first genome however, it became possible to measure the activity of ten’s of thousands of genes at once resulting in an omnipresent problem of generating interpretations of experimental results containing hundreds of genes.  While a scientist may come to grasp the literature surrounding a single gene quite well, it is not possible to know everything there is to know about all 20,000+ genes in the genome – particularly when this knowledge is expanding on a minute by minute basis.  As a consequence, there arose a need to produce summaries of what was known about each gene so that researchers could quickly grasp its nature and easily find links to more detailed references as needed.  By 2008, many different research groups published wikis attempting to allow the scientific community to generate the required articles, e.g. WikiProteins, WikiGenes, and the Gene Wiki.  The Gene Wiki project was unique among this group as it anchored itself directly to Wikipedia and, likely as a result of that decision, has enjoyed long term success.  This initiative works within the English Wikipedia community to encourage and support the collection of articles about human genes.  Its main contributions are the infobox seen on the right hand side of of these articles and software for generating new article stubs using that template.  
Wikidata and the Gene Wiki project
For the past several years, the Gene Wiki core team (funded by an NIH grant) has focused primarily on seeding Wikidata with biomedical knowledge.  In comparison to managing data via direct inclusion and parsing of infobox templates as before, this makes the data much easier to maintain automatically and, importantly, opens it up for use by other applications.  As a result, Wikipedia isn’t the only application that can use this structured information.   One of the first products of that process was a new module (Infobox_gene) that draws all the needed data to render the gene infobox dynamically from Wikidata, greatly reducing the technical challenge of keeping the data presented there in sync with primary sources.  
In addition to the relatively simple collection of gene identifiers and links off to key public databases that are presented in the infoboxes, Wikidata now has an extensive and growing network of knowledge linking genes to proteins, proteins to drugs, drugs to diseases, diseases to pathogens, pathogens to places, places to events, events to people, and so on and so on.  This unique, open, referenced, knowledge graph may eventually become the closest thing to ‘the sum of all human knowledge’.  Capturing knowledge in this structured form makes it possible to use it in all kinds of applications, each with their own community-specific user experiences.  As a case in point, the Gene Wiki group created Wikigenomes based primarily on data loaded into Wikidata.  This was followed quickly by Chlambase, an application specifically focused on distributing and collecting knowledge about different Chlamydia genomes.  These applications provide domain-specific user interface components such as genome browsers that are needed to present the relevant information effectively and thereby attract the attention of specialist users.  These users, in turn, have the opportunity to contribute their knowledge back to the broader community through contributions to Wikidata that can be mediated by the same software.  
Wikidata and the world
The molecular biology research community, as represented by the Gene Wiki project, are early adopters of Wikidata as a community platform for the collaborative curation and distribution of structured knowledge, but they are not alone.  The same fundamental patterns are already being applied by other communities, e.g. those interested in digital preservation and open bibliography.  In each case, we see communities working to transition from the current dominant paradigm of private knowledge management towards the knowledge commons approach made possible by wikidata.  This is not unlike the transition from the world of the Encyclopedia Britannica to the world of Wikipedia.  The only important difference is that the knowledge in question is structured in a way that makes it easier to reuse in different ways and in different applications.  

Wikidata provides a mechanism for massively increasing the global good generated by the Wikimedia Foundation’s work by capturing knowledge in a form that can be agilely used to empower all manner of software with the sum of human knowledge.  

Happy Memorial Day weekend!

Posted by on May 26, 2017 in citizen science, CitSci2017, Cochrane Crowd, conference, mark2cure, MedLitBlitz, poster, presentation | 0 comments

The last few weeks have been a bit hectic, so we've got plenty of news and info to share with you.

First of all, if you haven't seen it yet, Cochrane Crowd has posted about about our joint webinar and the #MedLitBlitz. If you missed the webinar or had technical difficulties/time zone issues with it, it's available on youtube. The prize packages for the top three participants of #MedLitBlitz are packed and will be shipped either today or early next week (depending on whether or not shipments have been picked up for today or not).

Secondly, Mark2Cure was at the Citizen Science Association conference from 2017.05.15-2017.05.20, and was fortunate enough to share about YOUR work to an audience of scientists who LOVE citizen science! More than a few researchers stopped to introduce themselves to me and spoke highly of our community! Although it's always weird to hear a recording of your own voice, I recorded my presentation because it wouldn't be fair to talk about the amazing work you've done without sharing it with you! You can find my presentation for the biomedical session in our youtube channel. On a side note, I know the audio quality isn't the best which is why I've transcribed it using youtube's captioning software. If you have trouble hearing the presentation (because of the poor audio quality), please turn on the closed captions.

Max also delivered two lightning talks for the event, which I hope to upload soon.
Not available yet, but will be soon In addition to the talks, we had a poster for Mark2Cure and a table at two public events.
Max spreading the love for Mark2Cure

We were especially pleased to be so close to our buddy at Cochrane Crowd for this event
Cochrane Crowd looking good

Lastly, it looks like one of the missions was completed just as I was settling back in after the conference. A HUGE thanks to everyone that helped complete the carpingly mission. A new mission has been launched in its place, so check it out if you have some free time.

MedLit Blitz, Mark2Curathon Results and More

Posted by on May 16, 2017 in citizen science, Cochrane Crowd, mark2cure, MedLitBlitz | 0 comments

Mark2Curathon Results

MedLit Blitz, Mark2Curathon Results and More

Sorry for the delay, the Mark2Curathon results are finally in! During the Mark2Cure portion of MedLit Blitz, we had 34 participants contribute over 16,000 annotations. Because both the entity recognition and the relationship extraction tasks are very different from Cochrane's screening task, we had to take some additional considerations when tallying the results.

For the Relationship Extraction module, multiple annotations per abstract were possible as each abstract could have any number of concept pairings. Hence, for the relationship extraction module each annotation submitted counted as one task unit

For the Entity Recognition module, only one submission was possible per abstract, but users needed to identify three different types of entities. Hence, each abstract completed counted as three task units (one for each concept type--genes, treatments, diseases). Additionally, a tiered bonus multiplier (of an additional 2% to 15%) was applied for users who submitted high quality annotations.

The RE and ER tasks units were then added together for each user, and sorted from highest to lowest in order to determine user ranking for the event. Without further ado, these were the top 15 participants in the Mark2Curathon:
1. ckrypton
2. Dr-SR
3. TAdams
4. hwiseman
5. Kien Pong Yap
6. skye
7. ScreenerDB
8. priyakorni
9. Judy E
10. pennnursinglib
11. Calico
12. AJ_Eckhart
13. uellis
14. sueandarmani
15. nclairoux

A huge thanks to you all, and everyone who participated for making our first adventure with Cochrane Crowd so successful!

To qualify for the MedLit Blitz prize, Mark2Curators had to have contributed to the Cochrane Screening Challenge as well.

MedLit Blitz Results

We are in the process of contacting the winners and hope to have an update about this soon.

Mark2Cure at Citizen Science Association Conference 2017

Max and I have arrived in Twin Cities, Minnesota for the Citizen Science conference. Mark2Cure was accepted as part of the symposium on biomedical citizen science. Additionally, Mark2Cure was also accepted for a poster presentation and for the project slam. If that doesn't sound busy enough, Mark2Cure was accepted for a table at the 'Night in the Cloud' event (open to the public). If you are in town, please stop by our table!

About the prizes

Winners will receive a Mark2Cure mug, marker, novelty item, in addition to any prizes that Cochrane has prepared for this event.

The Mark2Curathon starts now!

Posted by on May 11, 2017 in citizen science, Cochrane Crowd, mark2cure, MedLitBlitz | 0 comments

The Mark2Curathon starts now!

Our anniversary celebration with Cochrane Crowd is well under way. #MedLitBlitz started with a webinar on Monday, and was followed by the Cochrane screening challenge from Tuesday to Wednesday. During that challenge, over 100 MedLit Blitzers screened 29,494 citations--over nine THOUSAND more than the initial goal of 20,000!

But the celebrations aren't over yet. It's now time for the Mark2Curathon portion of #MedLitBlitz!

For this part, we've launched 3 new missions in the Entity Recognition module. To be clear, all annotations (regardless of whether they were submitted via the Entity Recognition or Relationship Extraction module) will count towards #MedLitBlitz as long as they fall within the time frame of the event. If you don't see the new ER missions, log out, clear your cache and log back in.

As with Cochrane Crowd, we will be active on twitter; however, we know that many of our most ardent Mark2Curators do not use twitter. For this reason, we will also be sharing updates via our chat channel. As with our previous Mark2Curathons, no sign up is required to chat on this channel, and we encourage you to join us there.

For ease of tracking, here's the countdown till the end of the event:

If you participated in the Cochrane screening challenge as part of #MedLitBlitz we'd love to hear about it! It's been really fun working with Anna and Emily over at Cochrane Crowd, we'll definitely look forward to working with them in the future. If you've enjoyed our collaborative effort, feel free to ping some praise to @AnnaNoelStorr and @cochrane_crowd.

Webinar, Mark2Curathon, and more

Posted by on Apr 28, 2017 in citizen science, Cochrane Crowd, mark2cure, scistarter | 0 comments

Webinar, Mark2Curathon, and more

It’s citizen science season and we’re in the thick of it!

First off, welcome new users! If you found us from the latest SciStarter campaign, feel free ping us on twitter to let us know so we can pass our thanks to the @SciStarter team! We’re very excited to be featured as part of SciStarter’s recent focus/feature on biomedical citizen science! Note, if you complete your SciStarter profile this month, the SciStarter team will send you a free digital copy of The Rightful Place of Science: Citizen Science. See their post for more details

Citizen science has enormous potential, and we’re glad that Mark2Curators are helping us explore its application towards biomedical discovery.

As mentioned last week, we’re not the only ones who need your help for dealing with the biomedical literature. Cochrane Crowd is reaching its first anniversary in joining this domain of citizen science, and we’re celebrating together! We will be jointly hosting a webinar on May 8th and there will be two 24hr screening challenges. There will be prizes for the top three contributors who take part in both the Cochrane Crowd and Mark2Cure screening challenges. Here are the details:

Mark2Cure/Cochrane Crowd Webinar:

Date/Time: May 08, 2017, 9:00am – 10:00am PDT

Tentative agenda:

  1. Intro (5 minutes)
  2. Mark2Cure presentation (15 mins)
  3. Cochrane Crowd presentation (15 mins)
  4. MedLit Blitz (5 minutes)
  5. Audience Q&A (15-20 mins)

Interested in participating in the webinar? You’ll need to register first! Hurry, space is limited (due to limitations/licensing restrictions) of the webinar software. Register here

Medlit Blitz (2 x 24 hr screening challenges):

Cochrane Challenge: Help Cochrane Crowd identify studies that provide the best possible evidence of the effectiveness of a health treatment. Once identified by the Crowd the studies go into a central register where health researchers and practitioners can access them. The more studies identified by the Crowd, the more high-quality evidence is available to help health practitioners treat their clients.

Challenge Start: May 9th, 2017 10am GMT + 1 (UK time zone) / 2am (PDT)

Challenge Finish: May 10th, 2017 10am GMT + 1 (UK time zone) / 2am (PDT)

Mark2Curathon: Join the search for clues on a rare disease by identifying genes, diseases, drugs, and the relationships between these based on literature surrounding the NGLY1.

Challenge Start: May 11th, 2017 7pm GMT + 1 (UK time zone) / 11am (PDT)

Challenge Finish: May 12th, 2017 7pm GMT + 1 (UK time zone) / 11am (PDT)

Get ready to use your reading skills to make a difference in biomedical science and health!!!

Celebrating the application of citizen science towards biomedical literature

Posted by on Apr 14, 2017 in citizen science, mark2cure | 0 comments

Upcoming event — Med Lit Blitz!

Extracting information from biomedical literature is a huge problem that many researchers are trying to solve computationally. Mark2Cure approaches the biomedical literature problem with citizen science in hopes of enhancing computational methods. Happily, we are no longer alone in this regard! In fact, a year after Mark2Cure officially launched, another citizen science project (Cochrane Crowd), officially launched in order to identify randomized control trial papers from the biomedical literature. Since both projects were launched in May, Mark2Cure and Cochrane Crowd will be celebrating our anniversaries together!

Join us in celebrating the project anniversaries and the amazing way citizen scientists and volunteers have been helping to address issues in biomedical literature. We will be having a slew of joint events with Cochrane Crowd the week of May 8th, which will include a webinar, a Cochrane crowd marathon, and a Mark2Curathon. Details on the webinar and Med Lit Blitz should be announced next week.

New Entity Recognition Mission now available:

Peripheral Myelin Protein 22 (PMP22) is an N-glycosylated transmembrane protein that is mainly found in the nervous system. It was identified by multiple users in many docs spread across several different missions. Perturbations in this protein's homeostasis have been linked to Charcot Marie Tooth (CMT) disease. Many cases of CMT are actually caused by a PMP22 gene duplication which results in the over expression of the gene. NGLY1 functions to de-glycosylate cytoplasmic proteins enabling them to be recycled. Can we learn about the mechanisms behind NGLY1's neurological symptoms from the literature on N-glycosylated proteins like PMP22? Help us explore the literature around an interesting clue that YOU found.

The Disease Ontology license converted to CC0

Posted by on Apr 12, 2017 in bio-ontologies, classification, disease ontology, Ontologies, Ontologies | 0 comments

[Editor note: This guest blog post is from Lynn Schriml, who is an Associate Professor at the University of Maryland School of Medicine, the PI of the Disease Ontology, and a close collaborator on the Gene Wiki project.]

Licensing of bioinformatics resources through Creative Commons licensing enables their free distribution of the content of the resource thus enabling open sharing, use, and expansion (derivative works) of the content.

This month (as of April 5, 2017) the Disease Ontology (DO) project has updated our data content licensing from CC BY 3.0 (Attribution) to CC0 (the most open license) to enhance collaboration and sharing. While we will continue to encourage users of the DO to cite our publications (available on our DO website:, broader licensing will encourage greater usage of this biomedical ontology.  It is important to point out that attribution  demonstrates usage of bioinformatic resources, which is critical to demonstrate utility and a broad user community for grant applications to fund project development. But we are convinced by the argument that requests for attribution that are encoded in the legal license are both ineffective and counterproductive.  (And other links in this discussion reinforced our conclusion.)

With the development of CC0 licensing and recent adoption by other important biomedical resources (e.g., CIVIC, WikiPathways, ECO) , it has become clear that for our biomedical ontology to be most useful, it has to be free of content licensing restrictions. The DO was created and shared for it to be used, thus open content licensing is the most appropriate license for this project. Classification of human diseases is a complex endeavor, one that is best approached in an open, collaborative and community-data-driven environment.