Current progress in Mark2Cure
Thanks to our wonderful community of citizen scientists and volunteers, the NFE2L1 entity recognition mission is now 99% complete! If you haven’t already completed quests in this mission, please help us finish it! The other entity recognition missions are also over two thirds complete and are in need of your reading skills. If entity recognition is too easy, please try out our relationship extraction module which can be quite challenging and may require more critical reading skills.
New project in development needs Mark2Curators!
Greg Stupp, a research scientist in the Su Lab here at TSRI is working on structuring clinical/drug indication data which is currently in free-form, unstructured text. This data has important implications for bioinformatics research aimed at drug repurposing, but we can’t build this database without your help. Greg will be building a new MediaWiki-based platform for crowdsourcing annotating clinical/drug indication text, which has the added advantage of structuring information so that it can be widely (and openly) disseminated by importing it into Wikidata.
What needs to be done?
Greg is currently working on the interface, and will need a few volunteers to provide feedback on the beta version as soon as it is available. After that, we will need our community of Mark2Curators to put what they’ve learned into practice. Annotating clinical/drug indication text will need Mark2Curators with BOTH the entity recognition AND relationship extraction skills. The rules for entity recognition are expected to be very similar for this task; however, we expect that more detailed relationships will be available so additional relationship extraction training may be necessary.
The first data set is expected to be generated from 1,100 paragraphs which will be very short but densely packed with information.
An example text may look like this:
MEKINIST® is indicated, as a single agent or in combination with dabrafenib, for the treatment of patients with unresectable or metastatic melanoma with BRAF V600E or V600K mutations, as detected by an FDA-approved test [see Clinical Studies (14.1)].
How will this work?
The interface is still under construction, but will primarily use queries and drop down selection menus so that you can select the best representation of the concepts in the text and how they are related. As a Mark2Curator, you’ve already been trained to recognize most of the entities involved in this task; however, the interface may require a little more detail in your selection. We’ll delve into the interface a little more as it gets fleshed out (it’s still very preliminary at this point).
How can you help?
If you are interested in providing feedback for improving the interface, please contact me. I will be compiling a list of beta testers for improving the user experience prior to the launch of the task. Given that the project is being built in the mediawiki framework, there may be huge limitations in the features we can modify; however, your feedback will be crucial in ensuring that we provide users with the information they need to successfully complete the task in spite of any constraints/limitations caused by the interface.
In September of 2015, the Citizen Science Association, announced the organization of a National Citizen Science Day which was scheduled for April 16, 2016. Unfortunately, the announcement went unnoticed by many San Diego citizen science project groups, and we were able to (just barely) assemble a San Diego Citizen Science Day Expo in time for the event. To avoid the last-minute planning of our event, we attempted to start the planning for our next event back in September of 2016--but there was a problem. We weren't able to ascertain when (or if there even would be) another Citizen Science Day. For this reason, we moved forward with our planning for a San Diego Citizen Science Expo to be held during the San Diego Festival of Science and Engineering Week--on March 11th.
Currently, we have over 16 different projects which will be exhibiting at this event, and we have a keynote speaker from the American Gut project which uses citizen science and crowdfunding to catalog the American gut microbiome. If you are in San Diego and are interested in other ways you can contribute to science, mark your calendars! The event will be at the La Jolla public library and there is no cost to attend. If you have free time and would like to help us staff the Mark2Cure table, please get in touch! We could really use at least two volunteers for the event. Come hang out with us and meet with other local Mark2Curators!
World Rare Disease Day is fast approaching (February 28th) and the theme this year is 'Research'. Because rare disease advocates, patients & family have been an important part of Mark2Cure's foundation, we will be running a week-long event in anticipation of Rare Disease Day. Mark4Rare will begin on February 20th and run to February 27th. During this time, we encourage Mark2Curators to announce the rare disease they care most about and complete quests in the name of that disease. We will aggregate the contributions during the Mark4Rare campaign and leverage it to raise awareness for the diseases of interest to our Mark2Curators through our social media channels. To sign up for the Mark4Rare event, please fill out this form. Please note, that only contributions between February 20th and February 27th will count towards Mark4Rare.
Mark2Cure’s first academic paper is finally published. For those of you who have been with us for a while, you may recall the posting of our preprint to biorxiv last January, so why was it only ‘published’ recently? Since all of our contributors are helping to make sense of the biomedical literature, I’d like to share the experience of writing, submitting, and publishing an academic research article, and why preprints and open access matter.
Let’s start with some informal (and partially biased) definitions.
- Academic Research Article – An unstructured text-based knowledge dump of what a researcher learned after conducting some experiments. The draft (or manuscripts) for these are usually reviewed and improved by all the co-authors, before being submitted to a journal. In highly competitive journals, there will be a first-pass review of the manuscript (pass/no pass type thing), and then the manuscript may be handed to an editor to be sent out for review by fellow researchers.
- Abstract – A summary of the key findings of an academic research article. Abstracts are generally accessible by all, even if the research article is not. Mark2Cure ‘docs’ are based on abstracts because these are consistently openly accessible.
- Meta data – information about information. If an academic research article is a knowledge dump, meta data is information about that knowledge dump. For example, title, authors, key words, etc. can be meta data for a research article.
- Peer Reviewer – a fellow researcher who takes the time to read a manuscript that has been submitted to an academic journal, and critiques the content based on their expert knowledge. It’s not a perfect system and there’s a hilarious site dedicated to unhelpful reviews, but a good reviewer is very useful for improving a manuscript.
- Pubmed – An NIH-run indexing service that collects abstracts and the meta data about the associated research article. Note that Pubmed, does NOT index every journal (since there are also garbage/predatory journals that don’t care about content). Pubmed indexes the articles/abstracts from the reputable journals primarily in the biomedical research space. Hence, there may still be important papers holding clues for biomedical research which are published in reputable non-biomedical journals, not indexed in Pubmed.
- Pubmed Central – A repository for published research articles. Since the NIH required that the public be able to access the research they helped to fund, research articles stemming from work sponsored by the NIH must be deposited into Pubmed Central by the publishers. Awesome, right?
- Preprint service – An online service that publishes un-reviewed manuscripts. In the old paradigm of academic publishing, no one would (save the reviewer) be able to see or read the manuscript until it was published by the journal. I optimistically assume the rationale behind this was to prevent badly written up research, or poorly conducted research from being headlined in the media. But there were major drawbacks to the old publication paradigm. Due to accessibility issues (many researchers could not access their own published papers), so many researchers started publishing their work in preprints which makes it available and easy to share. Also, using a preprint service is helpful for disseminating research, especially since the peer review, revision, and publication process can be very slow.
- Open Access – When the published academic research article can be read and accessed without some sort of expensive subscription, or paying money. (There’s more to open access than just the payment part, but this is good enough for now).
Why did it take almost two years for our Beta study to get published?
- Our small team prioritized moving Mark2cure forward over writing the manuscript for the Beta study, once we found that citizen scientists could do the task very well. That was why the manuscript wasn’t prepared until the end of Mark2Cure’s launch year.
- We submitted to a brand new journal. In support of the citizen science field, we submitted to the Citizen Science Association’s brand new journal. This means the kinks in the process have not been worked out, and there probably isn’t enough reviewer, editor, staff support available to make the process happen quickly. Because the Su Lab follows and advocates open science, our manuscript was deposited in the biorxiv preprint server, and the data shared in figshare.
A year in the publication process
As the proposal editor for GENE’s Gene Wiki Review series, I know how time-consuming it can be to find a reviewer. In some cases, you can go through a list of over 25 experts before you find one or two willing to review a paper. Researchers voluntarily review papers (often without any credit, though there are currently efforts to change this), so it may also take them time to review a paper. Although the Mark2Cure beta manuscript was submitted at the end of January, we did not get the reviews until the end of April.
Fortunately, the reviews were all very helpful critiques of our work. We revised our manuscript and resubmitted it mid-June, and it was accepted for publication within a few days of the submission. From that point on, the manuscript was in copy-editing and formatting process, which ended in December. For a well-established journal, this process is usually fairly fast; however, Citizen Science: Theory and Practice is new and probably does not have the level of staffing/support available to well-established journals.
TL;DR – Deposit in preprint servers, publish open access, and participate in the review process to help research move through the publication process efficiently. Mark2Cure beta study participants, you can view the fruits of your labor for free here.
BioGPS opened 2016 with a publication in Nucleic Acids Research, right after the New Year holiday. Throughout the year, new designs for the site were being created, reviewed, adjusted, reviewed, adjusted, and more review/adjustments in anticipation of a site redesign for 2017. A Plugin registration Blitz was held in March and April; followed by a Plugin Review Blitz in May. The BioGPS spotlight series was also restarted, with spotlights on BGEE, Intermine, and other Intermine-related plugins.
Discarding the default search genes, CDK2, the top 10 most frequently requested genes were:
Given their use as controls, the appearance of GAPDH and ACTB in the top 10 list, wasn’t too surprising. Out of the top 10 genes searched in BioGPS, seven had corresponding Gene Wiki articles that were not stubs. The three genes with incomplete Gene Wiki articles were ACTB, Trpm4, and KLF6. If you feel like sharing what you know about these three genes, consider editing their corresponding Wikipedia articles.
BioGPS users may not have been making as many requests in 2016 as they made in 2015, but in 2016, they still published about 72 articles citing the 2009 paper on BioGPS.
Happy New Year!!!
Thank you for contributing to Mark2Cure with your annotations, comments, questions, bug reports–and general feedback. You’ve made the Mark2Cure project an amazing project to work on and we are so grateful that you found us.
Many of you have been kind enough to report bugs and we are definitely trying to fix them. However, we only have one research programmer working on Mark2Cure; hence, many of these issues will take time to resolve. Our programmer will be unavailable in January, so we very much appreciate your patience in the new year. If you have programming experience and would like to help tackle some of these issues, please check out our repository on github (Mark2Cure is open source).
We are working to get the relationship module to properly display available tasks again, so hopefully you’ll see the queued relationship docs in your dashboard sometime in the near future.
An abbreviated timeline for Mark2Cure in 2016
-2016.01.27 – Beta experiment paper submitted
-2016.02.26 – The Mark2Cure team meets NGLY1 families.
-2016.04.16 – Citizen Science Day Expo organized by Mark2Cure and the La Jolla Public Library
-2016.04.15 – 2016.04.16: Mark2Curathon
-2016.05.21 – Mark2Cure Campaign for NGLY1 anniversary/launch of relationship app
-2016.06.01 – Mark2Curathon #2
-2016.06.07 – Andrew spoke at library for SD citizen science lecture series at the library
-2016.06.09 – Andrew’s interview on Patient Empowered
-2016.06.16 – Beta experiment paper accepted for publication
-2016.06.21 – Jennifer (research programmer) leaves the Su Lab
-2016.06.22 – Mark2Cure added to NIH-sponsored biomedical citizen science site
-2016.08.15-2016.08.21 – Mark2Cure joins the #dazzle4rare campaign
-2016.09.22 – Andrew speaks at Personalized Health in the Digital Age conference
-2016.10.17 – Andrew presents about Gene Wiki/Wikidata and Mark2Cure at TSRI seminar
-2016.12.04-2016.12.09 – Mark2Cure hosts @IamCitSci on twitter
-2016.12.21 – Beta experiment paper proof submitted
-2016.12.22 – NIH NCATS featured story about Mark2Cure
I really look forward to hearing from, learning from, and growing together with you and the Mark2Cure community in 2017. Be safe and have a happy, healthy, and exciting new year!
By now, I’ve posted year-end summaries for some of the major projects in the lab including:
MyGene.info: MyGene.info 2016 Year-end review
MyVariant.info: 2016–More incredible for MyVariant.info than MyGene.info
Gene Wiki / WikiData: 2016–A busy year for Gene Wiki and WikiData
And additional summaries are expected to come soon. The year-end summary for Mark2Cure will be posted tomorrow, and the BioGPS 2016 summary will be shared early in January.
But those are our major projects. Our lab has its fair share of collaborative projects, newer & not-yet-as-popular research tools. As with any academic research lab, we’ve also had our fair share of member changes in 2016.
Member changes in 2016
Moved on 2016.06.21 – Jennifer Fouquier
Moved on 2016.07.01 – Sandip Chatterjee
New member 2016.04.04 – Sebastien Lelong
New member 2016.05.01 – Mike Mayers
New member 2016.06.07 – Núria Queralt Rosinach
Member Upgrade 2016.12.01 – Greg Stupp has been promoted from postdoctoral research associate to staff scientist! Congratulations, Greg!
(plug alert) If you think what we do is cool, have some serious programming chops and want to join our team, ping us! (end plug)
Milestones reached by other projects in the lab
-2016.06.24: Branch Paper published (Branch: an interactive, web-based tool for testing hypotheses and developing predictive models)
-2016.04.04: Jake’s CryoEM project officially launches on the Zooniverse
-2016.02.26: Implicitome paper accepted (The Implicitome: A Resource for Rationalizing Gene-Disease Associations)
-2016.05.26: Knowledge.bio paper on biorxiv: Knowledge.Bio: A Web application for exploring, building and sharing webs of biomedical relationships mined from PubMed
Science Game Lab
-2016.06.21 – Press release: Solving science questions by playing games
-2016.06.21 – News article: Science Game Lab to be a central hub for scientists and gamers
-2016.06.21 – Science News Release
-2016.06.23-24 – Ben at Games for Change Conference
-2016.09.16 – Margaret on Science Friday mentioned SciGameLab
Looking forward to how these projects and more develop in 2017. If you have time, try them out and let us know what you think! Feedback is a crucial part of the improvement process.
As mentioned in MyGene.info’s year-end review, the team behind both services have been very productive. In addition to all the work they did on MyGene.info (and the back-end/BioThings services behind both MyGene.info and MyVariant.info, they made the following improvements to MyVariant.info this year:
-2016.01.06: ClinVar data structure overhauled
-2016.09: Starting this month, ClinVar data is updated on MyVariant.info on a regular basis.
-2016.10.03: https enabled
-2016.11.30: API status added to site
Also as mentioned in the year-end post for MyGene.info, there were two particular resources that we very excited about:
First, CIViC, which had been using MyGene.info, but recently incorporated MyVariant.info
Second, MyGene2, which is one of six finalists for the Open Science Prize and will potentially impact researcher work and rare disease patient lives alike. Note- Voting for the Open Science Prize is open until January 6th, so you can still vote for MyGene2.
Before we get to the best thing about 2016 for MyVariant.info, here’s what the MyVariant team has been up to in terms of spreading the word about this service.
-2016.01.22: Chunlei gave an update for Heart BD2K technical conference call
-2016.05.10: Corresponding press releases were published to Eureka Alert and Science Daily
-2016.05.25: MyGene and MyVariant were featured in a post on the Genome Biology blog
-2016.05.30: Chunlei was interviewed for Rarecast podcast about the services
-2016.11.28-30: Kevin presented a poster at the same meeting
Finally…MyVariant.info broke the record on usage. In just two years, MyVariant.info went from having just a fraction of the usage that MyGene.info had, to having about double MyGene’s traffic. For 2016 (as of December 22nd), MyVariant.info had almost 116 Million sessions according to google analytics.
It has been a busy year for the MyGene.info team as they worked to apply the lessons they learned in building MyGene.info and MyVariant.info towards a BioThings framework, and the generation of additional services. At this rate, 2017 is on track to be a very interesting year.
In 2016, aside from all the BioThings improvements happening behind-the-scenes, MyGene.info saw a number of improvements including:
-2016.07.07: API updated to v3
-2016.11.30: API status added to site
Clinical Interpretations of Variants in Cancer (CIViC) is an open data resource which demonstrates the power/utility of open data and crowdsourcing for cancer variant annotations. In addition to having a user-friendly interface that encourages community cancer variant annotation, the brains behind this valuable bioinformatics resource also committed to open data by switching to public licensing.
MyGene2 is simultaneously a means for crowdsourcing information on gene variants and a valuable matchmaking tool for the rare disease community. The site’s ingenious use of open data for both researchers and patients alike have made it a finalist for the Open Science Prize (and if you can vote for it to win here)
2016 saw a large increase in MyGene.info usage which happily means that other researchers are finding the service useful and hopefully making progress in their important work. According to google analytics, MyGene.info’s user increased to ~51 Million requests this year–over double the 19.2 Million recorded.
Other than the incredible work (done by others) using MyGene.info, the team has been busy this year sharing about their efforts:
-2016.01.22: Chunlei presents update for Heart BD2K technical conference call
Here’s looking forward to an exciting 2017…we can’t wait to see what you do!
I knew it! Last year, I struggled with the year-end post for the Gene Wiki/WikiData (AKA the Gene WikiData) project because the Gene WikiData team was incredibly active. I learned my lesson, and attempted to better track their activity earlier this year with a running tally. Still, the Gene WikiData team was so productive, I still had trouble keeping up with everything they did! They’ve been busy on this project–so much so, that it makes more sense to list their presentations, papers, slides; as opposed to trying to actually describe everything they’ve done this year. So here it is (and I’m probably missing things here and there), a non-exhaustive list of the Gene WikiData team’s activity (and this is only the activity of the project members here in La Jolla). I can’t even begin to cover the crazy amount of work the team members outside of La Jolla have put into this project.
You can see what they’ve done yourself by inspecting the linked, papers, presentations, slides:
- 2016.02.01: Sebastian’s paper accepted: Wikidata as a semantic framework for the Gene Wiki initiative
- 2016.02.19: Tim’s Paper accepted Centralizing content and distributing labor: a community model for curating the very long tail of microbial genomes
- 2016.04.04: BioHackathon proposal accepted. (one of 3 accepted out of 22 total submissions)
- 2016.04.11: Sebastian’s BioCuration 2016 poster
- 2016.04.13: Sebastian, Tim, and Ben’s BioCuration 2016 Workshop
- 2016.04.11: Tim’s BioCuration 2016 presentation
- 2016.04.18: Tim wins best poster award at Force11 2016
- 2016.05.20: Julia switched all gene infoboxes in Gene Wiki to the new wikidata-based infobox
- 2016.06.07: Gene Wiki Review Series editorial published
- 2016.06.08: Elsevier Connect post on Gene Wiki Review series editorial published
- 2016.06.12: Tim Putman Biohackathon 2016 presentation
- 2016.08.01: Ben’s Wikidata talk (slides here) for BioCreative. Paper here.
- 2016.08.23: Greg’s Cytoscape/Wikidata tool is released. View demo here.
- 2016.10.08: Tim’s Presentation at WikiConference North America, 2016. Video here.
- 2016.10.08: Sebastian’s Presentation at WikiConference, North America, 2016. Video here.
- 2016.10.08: Greg’s Presentation at WikiConference, North America, 2016.
- 2016.10.17: Andrew presents about Gene Wiki/Wikidata and Mark2Cure at TSRI seminar
Want to help? Here’s a really quick/easy way to do so: Join the WikiData property proposal discussion on Natural Science properties. The team has been requesting new properties in order to add/expand the information in Wikidata so it can become more useful. Properties need plenty of discussion and refinement in order to be approved, so chime in!
We just have three quick pieces of news to share:
- NIH/NCATS which funds our research has decided to do a feature story on Mark2Cure. This would NEVER have happened without your contributions to the project, so THANK YOU! You can find the feature here: https://ncats.nih.gov/pubs/features/citizen-science
(For those of you who have sent us bug reports or helped us with Ux testing, THANK YOU many, many times over. We’re still trying to fix the issues you’ve reported, but we’re definitely working on them.)
- Mark2Cure has been listed on a new citizen science gaming directory website. The site is a nice new resource for information on citizen science games, and we’ve been very excited about its launch. Hopefully it will integrate with Science Game Lab in the future so that even the act of submitting reviews/sharing about citizen science games will be rewarded/gamified.
- A few pictures from our Secret Santa event have come in. If you’ve already received/opened your Secret Santa gift, please be sure to send a captioned picture so we can share it with your Secret Santa.