Celebrating the application of citizen science towards biomedical literature

Upcoming event — Med Lit Blitz!

Extracting information from biomedical literature is a huge problem that many researchers are trying to solve computationally. Mark2Cure approaches the biomedical literature problem with citizen science in hopes of enhancing computational methods. Happily, we are no longer alone in this regard! In fact, a year after Mark2Cure officially launched, another citizen science project (Cochrane Crowd), officially launched in order to identify randomized control trial papers from the biomedical literature. Since both projects were launched in May, Mark2Cure and Cochrane Crowd will be celebrating our anniversaries together!

Join us in celebrating the project anniversaries and the amazing way citizen scientists and volunteers have been helping to address issues in biomedical literature. We will be having a slew of joint events with Cochrane Crowd the week of May 8th, which will include a webinar, a Cochrane crowd marathon, and a Mark2Curathon. Details on the webinar and Med Lit Blitz should be announced next week.

New Entity Recognition Mission now available:

Peripheral Myelin Protein 22 (PMP22) is an N-glycosylated transmembrane protein that is mainly found in the nervous system. It was identified by multiple users in many docs spread across several different missions. Perturbations in this protein's homeostasis have been linked to Charcot Marie Tooth (CMT) disease. Many cases of CMT are actually caused by a PMP22 gene duplication which results in the over expression of the gene. NGLY1 functions to de-glycosylate cytoplasmic proteins enabling them to be recycled. Can we learn about the mechanisms behind NGLY1's neurological symptoms from the literature on N-glycosylated proteins like PMP22? Help us explore the literature around an interesting clue that YOU found.

The Disease Ontology license converted to CC0

[Editor note: This guest blog post is from Lynn Schriml, who is an Associate Professor at the University of Maryland School of Medicine, the PI of the Disease Ontology, and a close collaborator on the Gene Wiki project.]

Licensing of bioinformatics resources through Creative Commons licensing enables their free distribution of the content of the resource thus enabling open sharing, use, and expansion (derivative works) of the content.

This month (as of April 5, 2017) the Disease Ontology (DO) project has updated our data content licensing from CC BY 3.0 (Attribution) to CC0 (the most open license) to enhance collaboration and sharing. While we will continue to encourage users of the DO to cite our publications (available on our DO website:, broader licensing will encourage greater usage of this biomedical ontology.  It is important to point out that attribution  demonstrates usage of bioinformatic resources, which is critical to demonstrate utility and a broad user community for grant applications to fund project development. But we are convinced by the argument that requests for attribution that are encoded in the legal license are both ineffective and counterproductive.  (And other links in this discussion reinforced our conclusion.)

With the development of CC0 licensing and recent adoption by other important biomedical resources (e.g., CIVIC, WikiPathways, ECO) , it has become clear that for our biomedical ontology to be most useful, it has to be free of content licensing restrictions. The DO was created and shared for it to be used, thus open content licensing is the most appropriate license for this project. Classification of human diseases is a complex endeavor, one that is best approached in an open, collaborative and community-data-driven environment.


New Entity Recognition Mission Available

Pilocarpine is a drug that was frequently identified by our Mark2Cure citizen scientists and volunteers in the previous sets of biomedical literature. It was often found in the context of seizures or tears—both of which are symptoms associated with NGLY1 deficiency. In humans, pilocarpine is used to pre-operatively treat some forms of glaucoma, and to treat the dry eyes associated with Sjögren's syndrome. Pilocarpine is also used to stimulate secretion of saliva and sweat and a “paucity of sweating” was noted in one case of an NGLY1-deficient patient. Are there underlying mechanisms in the pilocarpine literature that might help elucidate the symptoms of NGLY1-deficiency? Help us find out.
Pilocarpine appeared in the previously completed missions: ATGS, MATG, HSPS, Alacrima, HSPM, MATGS, HSPG and was identified by over 48 users such as AJEckhart, AnxietyAttacked, aprilwent, BridgetDS, cbologa, cburkham, cheryllaos, chu2k, CitizenSubflexa2, Ckrypton, Darkversev, Dmatsumoto, GaboGR, gajin4065, ggoom, GrantRVD, hallm21, hampton11235, HArielle, Isabelle, Jbm, JudyE, Klgmd, kuhno1980, LaraineAitken, lcb123, ldouglas5, Manabu, mariomar_it, metaphor, ok8080, rkaramch, sciguy29, skye, socalpam, sueandarmani, TAdams, Vsmalladi, Yanggan, and more!

Interested in viewing the knowledge networks for the missions (ie- document sets) listed above? Just copy/paste this url ( to your browser and append the mission’s symbol.

Eg- To view the description and network for the ATGS mission, go to:

Regarding Greg’s side project which was mentioned in the previous newsletter. Greg wasn’t able to release the preliminary interface due to identifier import issues prior to his trip to the Biocuration conference. Upon his return, he's fixed the issue, and is making some final adjustments before turning this loose.

Current progress in Mark2Cure

Thanks to our wonderful community of citizen scientists and volunteers, the NFE2L1 entity recognition mission is now 99% complete! If you haven’t already completed quests in this mission, please help us finish it! The other entity recognition missions are also over two thirds complete and are in need of your reading skills. If entity recognition is too easy, please try out our relationship extraction module which can be quite challenging and may require more critical reading skills.

New project in development needs Mark2Curators!

Greg Stupp, a research scientist in the Su Lab here at TSRI is working on structuring clinical/drug indication data which is currently in free-form, unstructured text. This data has important implications for bioinformatics research aimed at drug repurposing, but we can’t build this database without your help. Greg will be building a new MediaWiki-based platform for crowdsourcing annotating clinical/drug indication text, which has the added advantage of structuring information so that it can be widely (and openly) disseminated by importing it into Wikidata.

What needs to be done?

Greg is currently working on the interface, and will need a few volunteers to provide feedback on the beta version as soon as it is available. After that, we will need our community of Mark2Curators to put what they’ve learned into practice. Annotating clinical/drug indication text will need Mark2Curators with BOTH the entity recognition AND relationship extraction skills. The rules for entity recognition are expected to be very similar for this task; however, we expect that more detailed relationships will be available so additional relationship extraction training may be necessary.
The first data set is expected to be generated from 1,100 paragraphs which will be very short but densely packed with information.

An example text may look like this:

MEKINIST® is indicated, as a single agent or in combination with dabrafenib, for the treatment of patients with unresectable or metastatic melanoma with BRAF V600E or V600K mutations, as detected by an FDA-approved test [see Clinical Studies (14.1)].

How will this work?

The interface is still under construction, but will primarily use queries and drop down selection menus so that you can select the best representation of the concepts in the text and how they are related. As a Mark2Curator, you’ve already been trained to recognize most of the entities involved in this task; however, the interface may require a little more detail in your selection. We’ll delve into the interface a little more as it gets fleshed out (it’s still very preliminary at this point).

How can you help?

If you are interested in providing feedback for improving the interface, please contact me. I will be compiling a list of beta testers for improving the user experience prior to the launch of the task. Given that the project is being built in the mediawiki framework, there may be huge limitations in the features we can modify; however, your feedback will be crucial in ensuring that we provide users with the information they need to successfully complete the task in spite of any constraints/limitations caused by the interface.

Upcoming Mark2Cure-related events

In September of 2015, the Citizen Science Association, announced the organization of a National Citizen Science Day which was scheduled for April 16, 2016. Unfortunately, the announcement went unnoticed by many San Diego citizen science project groups, and we were able to (just barely) assemble a San Diego Citizen Science Day Expo in time for the event. To avoid the last-minute planning of our event, we attempted to start the planning for our next event back in September of 2016--but there was a problem. We weren't able to ascertain when (or if there even would be) another Citizen Science Day. For this reason, we moved forward with our planning for a San Diego Citizen Science Expo to be held during the San Diego Festival of Science and Engineering Week--on March 11th.

Currently, we have over 16 different projects which will be exhibiting at this event, and we have a keynote speaker from the American Gut project which uses citizen science and crowdfunding to catalog the American gut microbiome. If you are in San Diego and are interested in other ways you can contribute to science, mark your calendars! The event will be at the La Jolla public library and there is no cost to attend. If you have free time and would like to help us staff the Mark2Cure table, please get in touch! We could really use at least two volunteers for the event. Come hang out with us and meet with other local Mark2Curators!

World Rare Disease Day is fast approaching (February 28th) and the theme this year is 'Research'. Because rare disease advocates, patients & family have been an important part of Mark2Cure's foundation, we will be running a week-long event in anticipation of Rare Disease Day. Mark4Rare will begin on February 20th and run to February 27th. During this time, we encourage Mark2Curators to announce the rare disease they care most about and complete quests in the name of that disease. We will aggregate the contributions during the Mark4Rare campaign and leverage it to raise awareness for the diseases of interest to our Mark2Curators through our social media channels. To sign up for the Mark4Rare event, please fill out this form. Please note, that only contributions between February 20th and February 27th will count towards Mark4Rare.

Mark2Cure Beta Study officially published

Mark2Cure’s first academic paper is finally published. For those of you who have been with us for a while, you may recall the posting of our preprint to biorxiv last January, so why was it only ‘published’ recently? Since all of our contributors are helping to make sense of the biomedical literature, I’d like to share the experience of writing, submitting, and publishing an academic research article, and why preprints and open access matter.

Let’s start with some informal (and partially biased) definitions.

  • Academic Research Article – An unstructured text-based knowledge dump of what a researcher learned after conducting some experiments. The draft (or manuscripts) for these are usually reviewed and improved by all the co-authors, before being submitted to a journal. In highly competitive journals, there will be a first-pass review of the manuscript (pass/no pass type thing), and then the manuscript may be handed to an editor to be sent out for review by fellow researchers.
  • Abstract – A summary of the key findings of an academic research article. Abstracts are generally accessible by all, even if the research article is not. Mark2Cure ‘docs’ are based on abstracts because these are consistently openly accessible.
  • Meta data – information about information. If an academic research article is a knowledge dump, meta data is information about that knowledge dump. For example, title, authors, key words, etc. can be meta data for a research article.
  • Peer Reviewer – a fellow researcher who takes the time to read a manuscript that has been submitted to an academic journal, and critiques the content based on their expert knowledge. It’s not a perfect system and there’s a hilarious site dedicated to unhelpful reviews, but a good reviewer is very useful for improving a manuscript.
  • Pubmed – An NIH-run indexing service that collects abstracts and the meta data about the associated research article. Note that Pubmed, does NOT index every journal (since there are also garbage/predatory journals that don’t care about content). Pubmed indexes the articles/abstracts from the reputable journals primarily in the biomedical research space. Hence, there may still be important papers holding clues for biomedical research which are published in reputable non-biomedical journals, not indexed in Pubmed.
  • Pubmed Central – A repository for published research articles. Since the NIH required that the public be able to access the research they helped to fund, research articles stemming from work sponsored by the NIH must be deposited into Pubmed Central by the publishers. Awesome, right?
  • Preprint service – An online service that publishes un-reviewed manuscripts. In the old paradigm of academic publishing, no one would (save the reviewer) be able to see or read the manuscript until it was published by the journal. I optimistically assume the rationale behind this was to prevent badly written up research, or poorly conducted research from being headlined in the media. But there were major drawbacks to the old publication paradigm. Due to accessibility issues (many researchers could not access their own published papers), so many researchers started publishing their work in preprints which makes it available and easy to share. Also, using a preprint service is helpful for disseminating research, especially since the peer review, revision, and publication process can be very slow.
  • Open Access – When the published academic research article can be read and accessed without some sort of expensive subscription, or paying money. (There’s more to open access than just the payment part, but this is good enough for now).
Why did it take almost two years for our Beta study to get published?
  1. Our small team prioritized moving Mark2cure forward over writing the manuscript for the Beta study, once we found that citizen scientists could do the task very well. That was why the manuscript wasn’t prepared until the end of Mark2Cure’s launch year.
  2. We submitted to a brand new journal. In support of the citizen science field, we submitted to the Citizen Science Association’s brand new journal. This means the kinks in the process have not been worked out, and there probably isn’t enough reviewer, editor, staff support available to make the process happen quickly. Because the Su Lab follows and advocates open science, our manuscript was deposited in the biorxiv preprint server, and the data shared in figshare.
A year in the publication process

As the proposal editor for GENE’s Gene Wiki Review series, I know how time-consuming it can be to find a reviewer. In some cases, you can go through a list of over 25 experts before you find one or two willing to review a paper. Researchers voluntarily review papers (often without any credit, though there are currently efforts to change this), so it may also take them time to review a paper. Although the Mark2Cure beta manuscript was submitted at the end of January, we did not get the reviews until the end of April.

Fortunately, the reviews were all very helpful critiques of our work. We revised our manuscript and resubmitted it mid-June, and it was accepted for publication within a few days of the submission. From that point on, the manuscript was in copy-editing and formatting process, which ended in December. For a well-established journal, this process is usually fairly fast; however, Citizen Science: Theory and Practice is new and probably does not have the level of staffing/support available to well-established journals.

TL;DR – Deposit in preprint servers, publish open access, and participate in the review process to help research move through the publication process efficiently. Mark2Cure beta study participants, you can view the fruits of your labor for free here.

A look back at 2016 for BioGPS

BioGPS opened 2016 with a publication in Nucleic Acids Research, right after the New Year holiday. Throughout the year, new designs for the site were being created, reviewed, adjusted, reviewed, adjusted, and more review/adjustments in anticipation of a site redesign for 2017. A Plugin registration Blitz was held in March and April; followed by a Plugin Review Blitz in May. The BioGPS spotlight series was also restarted, with spotlights on BGEE, Intermine, and other Intermine-related plugins.

There were ~910,000 requests made to BioGPS in 2016. Requests to BioGPS peaked in March and at the lowest in December.

Most searches involved human and mouse genes by far, though thousands of requests were also made for other model organisms.

Discarding the default search genes, CDK2, the top 10 most frequently requested genes were:

  1. GAPDH
  2. TP53
  3. Trpm4
  4. EGFR
  5. CD274
  6. KLF6
  7. TNF
  8. ACTB
  9. MYC
  10. AKT1

Given their use as controls, the appearance of GAPDH and ACTB in the top 10 list, wasn’t too surprising. Out of the top 10 genes searched in BioGPS, seven had corresponding Gene Wiki articles that were not stubs. The three genes with incomplete Gene Wiki articles were ACTB, Trpm4, and KLF6. If you feel like sharing what you know about these three genes, consider editing their corresponding Wikipedia articles.

BioGPS users may not have been making as many requests in 2016 as they made in 2015, but in 2016, they still published about 72 articles citing the 2009 paper on BioGPS.

A look back at 2016 with Mark2Cure

Happy New Year!!!

Thank you for contributing to Mark2Cure with your annotations, comments, questions, bug reports–and general feedback. You’ve made the Mark2Cure project an amazing project to work on and we are so grateful that you found us.

Many of you have been kind enough to report bugs and we are definitely trying to fix them. However, we only have one research programmer working on Mark2Cure; hence, many of these issues will take time to resolve. Our programmer will be unavailable in January, so we very much appreciate your patience in the new year. If you have programming experience and would like to help tackle some of these issues, please check out our repository on github (Mark2Cure is open source).

We have some new entity recognition (ER) missions available which you can directly access from the links below:

We are working to get the relationship module to properly display available tasks again, so hopefully you’ll see the queued relationship docs in your dashboard sometime in the near future.

An abbreviated timeline for Mark2Cure in 2016

-2016.01.27 – Beta experiment paper submitted
-2016.02.26 – The Mark2Cure team meets NGLY1 families.

-2016.04.16 – Citizen Science Day Expo organized by Mark2Cure and the La Jolla Public Library
-2016.04.15 – 2016.04.16: Mark2Curathon

-2016.05.21 – Mark2Cure Campaign for NGLY1 anniversary/launch of relationship app
-2016.06.01 – Mark2Curathon #2
-2016.06.07 – Andrew spoke at library for SD citizen science lecture series at the library

-2016.06.09 – Andrew’s interview on Patient Empowered
-2016.06.16 – Beta experiment paper accepted for publication

-2016.06.21 – Jennifer (research programmer) leaves the Su Lab
-2016.06.22 – Mark2Cure added to NIH-sponsored biomedical citizen science site
-2016.08.15-2016.08.21 – Mark2Cure joins the #dazzle4rare campaign

-2016.09.22 – Andrew speaks at Personalized Health in the Digital Age conference
-2016.10.17 – Andrew presents about Gene Wiki/Wikidata and Mark2Cure at TSRI seminar

-2016.12.04-2016.12.09 – Mark2Cure hosts @IamCitSci on twitter
-2016.12.21 – Beta experiment paper proof submitted

-2016.12.22 – NIH NCATS featured story about Mark2Cure

I really look forward to hearing from, learning from, and growing together with you and the Mark2Cure community in 2017. Be safe and have a happy, healthy, and exciting new year!

Wrapping up 2016 in the Su Lab

There are a LOT of projects going on in the Su Lab–so many, that it’s hard to keep track of them all.

By now, I’ve posted year-end summaries for some of the major projects in the lab including: 2016 Year-end review 2016–More incredible for than
Gene Wiki / WikiData: 2016–A busy year for Gene Wiki and WikiData

And additional summaries are expected to come soon. The year-end summary for Mark2Cure will be posted tomorrow, and the BioGPS 2016 summary will be shared early in January.

But those are our major projects. Our lab has its fair share of collaborative projects, newer & not-yet-as-popular research tools. As with any academic research lab, we’ve also had our fair share of member changes in 2016.

Member changes in 2016
Moved on 2016.06.21 – Jennifer Fouquier
Moved on 2016.07.01 – Sandip Chatterjee
New member 2016.04.04 – Sebastien Lelong
New member 2016.05.01 – Mike Mayers
New member 2016.06.07 – Núria Queralt Rosinach
Member Upgrade 2016.12.01 – Greg Stupp has been promoted from postdoctoral research associate to staff scientist! Congratulations, Greg!
(plug alert) If you think what we do is cool, have some serious programming chops and want to join our team, ping us! (end plug)

Milestones reached by other projects in the lab
-2016.06.24: Branch Paper published (Branch: an interactive, web-based tool for testing hypotheses and developing predictive models)
-2016.04.04: Jake’s CryoEM project officially launches on the Zooniverse
-2016.02.26: Implicitome paper accepted (The Implicitome: A Resource for Rationalizing Gene-Disease Associations)
-2016.05.26: paper on biorxiv: Knowledge.Bio: A Web application for exploring, building and sharing webs of biomedical relationships mined from PubMed
Science Game Lab
-2016.06.21 – Press release: Solving science questions by playing games
-2016.06.21 – News article: Science Game Lab to be a central hub for scientists and gamers
-2016.06.21 – Science News Release
-2016.06.23-24 – Ben at Games for Change Conference
-2016.09.16 – Margaret on Science Friday mentioned SciGameLab

Looking forward to how these projects and more develop in 2017. If you have time, try them out and let us know what you think! Feedback is a crucial part of the improvement process.

2016–More incredible for than

As mentioned in’s year-end review, the team behind both services have been very productive. In addition to all the work they did on (and the back-end/BioThings services behind both and, they made the following improvements to this year:
-2016.01.06: ClinVar data structure overhauled

-2016.01.18: ClinVar data updated
-2016.07.18: Support for GRCh38/hg38 added

-2016.08.22: Python client updated to v0.3.1
-2016.08.25: Additional ExAC data added/updated

-2016.09: Starting this month, ClinVar data is updated on on a regular basis.
-2016.10.03: https enabled

-2016.11.30: API status added to site

Also as mentioned in the year-end post for, there were two particular resources that we very excited about:

First, CIViC, which had been using, but recently incorporated

Second, MyGene2, which is one of six finalists for the Open Science Prize and will potentially impact researcher work and rare disease patient lives alike. Note- Voting for the Open Science Prize is open until January 6th, so you can still vote for MyGene2.

Before we get to the best thing about 2016 for, here’s what the MyVariant team has been up to in terms of spreading the word about this service.
-2016.01.22: Chunlei gave an update for Heart BD2K technical conference call

-2016.03.29: The paper for was accepted to Genome Biology
-2016.04.05: Kevin presented at ISCB’s NGS 2016 conference

-2016.05.10: Corresponding press releases were published to Eureka Alert and Science Daily
-2016.05.25: MyGene and MyVariant were featured in a post on the Genome Biology blog
-2016.05.30: Chunlei was interviewed for Rarecast podcast about the services

-2016.06.09: The MyGene/MyVariant services were invited for a guest post on the Elasticsearch blog
-2016.07.10: Kevin did a poster presentation at ISMB 2016

-2016.07.12: Chunlei also presented at ISMB 2016
-2016.11.29: Chunlei presented at BD2K AHM 2016

-2016.11.28-30: Kevin presented a poster at the same meeting

Finally… broke the record on usage. In just two years, went from having just a fraction of the usage that had, to having about double MyGene’s traffic. For 2016 (as of December 22nd), had almost 116 Million sessions according to google analytics.