Blog 2016 Year-end review

Posted by on Dec 27, 2016 in BD2K, BioGPS,,, publications | 0 comments

It has been a busy year for the team as they worked to apply the lessons they learned in building and towards a BioThings framework, and the generation of additional services. At this rate, 2017 is on track to be a very interesting year.

In 2016, aside from all the BioThings improvements happening behind-the-scenes, saw a number of improvements including:
-2016.07.07: API updated to v3

-2016.08.18: Python Client Updated to v.3.0.0
-2016.08.25: ExAC functional constraint scores added

-2016.11.30: API status added to site

Even more exciting was the incredible progress made by other groups that use’s service, especially CIViC and MyGene2.

Clinical Interpretations of Variants in Cancer (CIViC) is an open data resource which demonstrates the power/utility of open data and crowdsourcing for cancer variant annotations. In addition to having a user-friendly interface that encourages community cancer variant annotation, the brains behind this valuable bioinformatics resource also committed to open data by switching to public licensing.

MyGene2 is simultaneously a means for crowdsourcing information on gene variants and a valuable matchmaking tool for the rare disease community. The site’s ingenious use of open data for both researchers and patients alike have made it a finalist for the Open Science Prize (and if you can vote for it to win here)

2016 saw a large increase in usage which happily means that other researchers are finding the service useful and hopefully making progress in their important work. According to google analytics,’s user increased to ~51 Million requests this year–over double the 19.2 Million recorded.

Other than the incredible work (done by others) using, the team has been busy this year sharing about their efforts:
-2016.01.22: Chunlei presents update for Heart BD2K technical conference call

-2016.03.29: paper accepted to Genome Biology
-2016.04.05: Kevin presents at ISCB’s NGS 2016 conference

-2016.05.10: Corresponding press release to Eureka Alert and Science Daily
-2016.05.25: Genome Biology blog post

-2016.05.30: Rarecast podcast about
-2016.06.09: Elasticsearch blog post

-2016.07.10: Kevin’s poster presentation at ISMB 2016
-2016.07.12: Chunlei’s presentation at ISMB 2016

-2016.11.29: Chunlei’s presentation at BD2K AHM 2016
-2016.11.28-30: Kevin’s poster presentation at BD2K AHM 2016

Here’s looking forward to an exciting 2017…we can’t wait to see what you do!

2016 – A busy year for Gene Wiki and WikiData

Posted by on Dec 26, 2016 in big data, biocuration, conference, Gene Wiki, poster, presentation, publication, semantic wikipedia, wiki, Wikidata, wikipedia | 0 comments

I knew it! Last year, I struggled with the year-end post for the Gene Wiki/WikiData (AKA the Gene WikiData) project because the Gene WikiData team was incredibly active. I learned my lesson, and attempted to better track their activity earlier this year with a running tally. Still, the Gene WikiData team was so productive, I still had trouble keeping up with everything they did! They’ve been busy on this project–so much so, that it makes more sense to list their presentations, papers, slides; as opposed to trying to actually describe everything they’ve done this year. So here it is (and I’m probably missing things here and there), a non-exhaustive list of the Gene WikiData team’s activity (and this is only the activity of the project members here in La Jolla). I can’t even begin to cover the crazy amount of work the team members outside of La Jolla have put into this project.

You can see what they’ve done yourself by inspecting the linked, papers, presentations, slides:

Want to help? Here’s a really quick/easy way to do so: Join the WikiData property proposal discussion on Natural Science properties. The team has been requesting new properties in order to add/expand the information in Wikidata so it can become more useful. Properties need plenty of discussion and refinement in order to be approved, so chime in!

Merry X-mas and Happy Holidays

Posted by on Dec 23, 2016 in citizen science, crowdsourcing, games, mark2cure, press | 0 comments

We just have three quick pieces of news to share:

  1. NIH/NCATS which funds our research has decided to do a feature story on Mark2Cure. This would NEVER have happened without your contributions to the project, so THANK YOU! You can find the feature here:
    (For those of you who have sent us bug reports or helped us with Ux testing, THANK YOU many, many times over. We’re still trying to fix the issues you’ve reported, but we’re definitely working on them.)
  2. Mark2Cure has been listed on a new citizen science gaming directory website. The site is a nice new resource for information on citizen science games, and we’ve been very excited about its launch. Hopefully it will integrate with Science Game Lab in the future so that even the act of submitting reviews/sharing about citizen science games will be rewarded/gamified.
  3. A few pictures from our Secret Santa event have come in. If you’ve already received/opened your Secret Santa gift, please be sure to send a captioned picture so we can share it with your Secret Santa.

Birthday Wishes for Bertrand

Posted by on Dec 9, 2016 in citizen science, community intelligence, crowdsourcing, mark2cure | 0 comments

The Might family has been an incredibly moving proponent of precision medicine, citizen science, and rare disease. Thank you for what you’ve done for Mark2Cure and the fields of health and science. We wish Bertrand and his family health, happiness, and prosperity in the upcoming year.

In case you haven’t noticed, we’ve carved our blog out from our lab’s main blog and will serve it up directly on our site at If would like to write a post to share with the Mark2Cure community, let us know! If you have recommendations, or if there are things you’d like to see discussed on our blog, contact us!

This week, Mark2Cure hosted the @IamCitSci twitter account to share what we learned to the rest of the citizen science community. For details, see our storify.

Lastly, thank you for your patience and a HUGE THANKS to those of you who have taken the time to report bugs in the system via email, posting on the talk pages etc. We really appreciate it very much, and are working to resolve them as quickly as possible.

Happy Thanksgivings Mark2Curators!

Posted by on Nov 23, 2016 in citizen science, community intelligence, crowdsourcing, games, mark2cure | 0 comments

It’s still a little early, but we wanted to wish you well before the rest of your week got hectic with all the Turkey-day goodness. Since the project started, we’re grateful to have had the chance to learn and be inspired by you. For those of you who have taken the time to write to us, we’re awed by how driven, analytical, humorous, creative, and helpful you are–and we wanted some means of sharing that with you.

If only there was some way to send you a gift for the holiday season…
…given our experience with citizen science, maybe we can crowd-source it?…

And that’s how we came up with the Mark2Cure Secret Santa! Keep an eye out for our newsletter in your email inbox for more details on how to sign-up/participate.

New Mission Available!
When an NGLY1 researcher heard about what you all were doing, she wondered if the community had found anything on her gene of interest. Help search for clues on this gene in our new mission.

Your comments matter!
With complex programs, something is bound to get broken when code is updated, and it would be impossible to find and squash all the bugs, if users didn’t report them. If Mark2Cure gets wonky for you, we’d LOVE to know! Post to the talk pages, or send us an email. We can’t improve without you!

Wikiconference North America

Posted by on Oct 8, 2016 in collective wisdom, community intelligence, conference, Gene Wiki, Wikidata, wikipedia | 0 comments

Wikipedia is one of the most widely used and freely accessible knowledgebases. As one of the largest, crowdsourced resources, many researchers are engaged in making the Wikimedia platform more useful in scientific research—including researchers from our lab here at The Scripps Research Institute. If you’re interested in how Wikipedia and Wikidata are being used in science, you should check out the Wikiconference, happening today through Monday. Registration is free (or $10-$25 with food included). More details here.

Three members of the Su Lab will be presenting their work on pushing the boundaries of how Wikidata can be used in academic research.

First up is Tim. Tim will present his work on developing a microbial specific semantic data model in Wikidata modeling bacterial species, genes, proteins, diseases they cause, and drugs that treat them. He’s part of the 1:00-3:00pm session today in the Clark Room at the San Diego Central Library. Learn more.
Tim’s presentation slides can be found on figshare.
Recording of Tim’s presentation on youtube.

Both Greg and Sebastian are scheduled to present during the 3:30-5:00pm session today, at roughly the same time too! Fortunately, the conference organizers have plans to record the talks, unfortunately the quality of the video recordings tend to vary greatly.

Greg will present on a Cytoscape browser he developed using the Wikidata SPARQL endpoint. This allows him to combine a powerful way for pulling more complex information from Wikidata with a useful method of visualization. Greg will be presenting in the Shiley room on the 9th floor of the library. Learn more.
An early version of his slides are available on google drive.
Recording of Greg’s presentation on youtube (pending)

At the same time, Sebastian will be in the Clark room to present his work on adding Drug and chemical compound items in Wikidata as a data source for Wikipedia infoboxes. Sebastian played an important role in adding gene and protein items into Wikidata which allowed Gene Wiki infoboxes to pull from Wikidata. Learn more.
An early version of his slides are available on slideshare.
Recording of Sebastian’s presentation on youtube.
2016.10.09 update – links to the slides for their presentations have been added. Video of their talks is pending.
2016.10.12 update – link to recording of Tim’s presentation added.
2016.10.19 update – link to recording of Sebastian’s presentation added.

Mark2Cure Pro-tip #2- Verifying your hunch

Posted by on Oct 7, 2016 in citizen science, crowdsourcing, mark2cure | 0 comments

In Mark2Cure, it’s almost guaranteed that you’ll encounter words you’ve never seen before. Often times, you can use the context to infer whether or not that term should be marked. Other times, the abstract may be just too jargon-laden or poorly written to make that determination. How do you determine whether or not to mark a term under this situation?
Aside from tips you may have picked up from other Mark2Curators via the talk pages (see pro-tip #1), here’s how a few of our Mark2Curators have approached this problem.

#1 – Revisit the rules: The rules for each concept/entity recognition task are linked via the colored boxes at the bottom of the task page.

Clicking on any of the colored boxes (‘Disease Concept’, ‘Genes Concept’, or ‘Treatments Concept’) will take you to the instructions page which details what you should mark

Clicking on any of the colored boxes (‘Disease Concept’, ‘Genes Concept’, or ‘Treatments Concept’) will take you to the instructions page which details what you should mark

Because many things in biomedical research are related it can be tempting to mark terms that are related to the concepts even if they’re not mentioned in the instructions. For example, specific gene variants may appear in the text but gene variants are not listed in the instructions. As tempting as it is, you should refrain from marking the gene variants and try to stay as close to the rules as possible.

#2 – Use the search button. The last term of phrase you highlight is captured and becomes the default terms for the search button at the bottom of the task page. Once you’ve highlighted something, click on the blue ‘search’ button on the bottom to search the terms you just highlighted. You are also welcome to look up terms for the Concept/entity recognition task—in fact, we encourage you to do so! We’re big fans of learning, and you will often learn from whatever you bother to look up! The instructions page also contains links to concept-specific databases which may be helpful, but Wikipedia is also a great resource to search.

#3 – Reach out. Talk pages are a great place to have doc-specific questions answered, and posting your questions to the talk pages helps the entire community so we can learn together! You can also contact us by posting on twitter, facebook, Wikia, or via email. Please note, facebook and Wikia are currently the slowest way to reach us, though we’ll try to be better with them. Although we may not have definitive answers for all your questions (by nature, language has ambiguity, and sometimes that makes it possible for multiple choices to be correct), we’ll try to at least provide guidance wherever you’re stuck.

Mark2Cure Pro-tip #1

Posted by on Sep 23, 2016 in citizen science, collective intelligence, crowdsourcing, mark2cure | 0 comments

As a citizen science project, Mark2Cure has been very fortunate to recruit a number of volunpeers who have grown to become experts at the concept recognition task and provide a lot of useful feedback, comments, and suggestions. While we encourage you to explore and use the site in the ways that work best for you, we would like to share some of the great tips that users have shared with us. Without further ado, here is our first user pro-tip.

Pro-tip #1 – Using the talk pages to learn together
Upon the completion of a doc, you will be presented with a feedback screen which shows what your partner has marked, and the option to visit the talk page for that doc.

The talk page button is right next to the 'Next Doc' button on the feedback screen.

The talk page button is right next to the ‘Next Doc’ button on the feedback screen.

If there was anything in the document that you felt was confusing or unclear (in terms of whether or not it should be marked), someone else probably felt the same way you did. If you click on ‘Yes, Let’s Talk’, you may find that your question has been asked and learn from the answer; or you can post your question there so that others will learn from your question.

If you’re worried that your question has been asked elsewhere, DON’T worry about that! Each talk page is viewable only by users who have contributed to that doc. This means that duplicate questions are welcome as they increase the likelihood new users will encounter them.
On the talk page itself, there are several ways to check your work.

First is in the comments that users submitted:

Comments and questions submitted by users may have very useful info!

Comments and questions submitted by users may have very useful info!

As seen here, this user has offered some very useful background information about some terms in the text.

Secondly, you can see how other users marked this doc and learn off them by scrolling over the numbers at the top of the talk page.

Different users may mark different terms in the text. Pay special attention to terms marked the same way by multiple users.

Different users may mark different terms in the text. Pay special attention to terms marked the same way by multiple users.

By looking at how other users marked the doc, you can get an idea on what annotations the community agrees should be marked.

If you dislike scrolling through the user annotations, you can also get a feel for what the community marked by looking at the frequency table below. The table exhibits the top 20-25 most frequently marked terms for each concept type.

The frequency table gives you a quick way to see the terms that were marked by people who have also completed this doc.

The frequency table gives you a quick way to see the terms that were marked by people who have also completed this doc.

Lastly, if you’re wondering why the majority of users marked something differently than you did, or if you just have doubts about what you’re doing, you are welcome to post about it! We will always try to respond in a timely fashion, but an amazing Mark2Curator may just beat us to it!

A big thanks to everyone who has posted to the talk pages for helping to improve the learning opportunities in Mark2Cure

A big thanks to everyone who has posted to the talk pages for helping to improve the learning opportunities in Mark2Cure

If you’d like to chime in on a discussion about a document you’ve contributed to before, click on the ‘Talk’ link at the top right corner of your dashboard header. This will show you the docs that you’ve done which are under discussion. If a doc you’ve done doesn’t appear, it’s likely because no one has discussed the doc. If another users begins a discussion on a doc that you’ve completed, it will appear in your list.

Special thanks to ckrypton for responding faster than I do on some of these, and for giving us the idea of this pro-tip!

BioGPS Spotlight on WormBase and WormMine

Posted by on Sep 2, 2016 in BioGPS, plugin, spotlight | 0 comments

If worms are your model organism of choice, you’re probably already familiar with WormBase. WormBase was founded in 2000 and is a multi-institutional consortium led by Paul Sternberg of the California Institute of Technology (CalTech), Paul Kersey of the European Bioinformatics Institute (EBI), Matt Berriman of the Wellcome Trust Sanger Institute, and Lincoln Stein of the Ontario Institute for Cancer Research (OICR). WormBase hosts a number of valuable tools for researchers, and is responsible for the development of WormMine, an Intermine-based tool that replaced WormMart. One of WormBase’s database curators, Ranjana Kishore, was kind enough to answer our questions for this series.

  1. In one tweet or less, introduce us to WormBase:
    WormBase is the central repository for data related to the genetics, genomics and biology of C. elegans and related nematodes.

  3. How did WormBase get its start? What was WormMart? At what point in WormBase’s history was InterMine integrated for the creation of WormMine?
    In the beginning was AceDB (A C. elegans database) a database developed by Richard Durbin and Jean Thierry-Mieg in 1989, for hosting genomic data, which included a graphical user interface with specific displays and tools for querying data. In early 2000, Paul Sternberg and colleagues at Caltech who worked with C. elegans as a model system realized that there was no online repository of information for this model organism that was increasingly being used, as a result of a fast-growing community of researchers. Fly researchers had FlyBase, but there was no online database for C. elegans. The C. elegans model system was drawing more and more researchers because of the several advantages it has–rapid generation time, simple nervous system, invariant cell lineage, transparent body, etc. Data was rapidly accumulating but there was no place to collect, organize or disseminate it for the good of the community. Thus began WormBase!

    WormMart was the WormBase implementation of BioMart for querying data in batch mode. WormBase switched to WormMine in 2013 which is based on InterMine, an open source data warehouse built specifically for the integration and analysis of complex biological data.

  5. Who is WormBase’s target audience?
    The C. elegans research community, the wider nematode research community, biomedical researchers and anyone interested in nematode model systems including college and high school teachers and students!

  7. WormBase seems to serve a thriving community of researchers. What is your greatest success story so far?
    WormBase serves a thriving and dynamic community of nematode researchers who use the C. elegans and other nematode systems to study diverse topics such as systems biology, signaling pathways, cell death, human diseases, drug efficacy and potential new drug screening. Research in C. elegans has won two Nobel prizes, in 2002 and 2006. WormBase itself has expanded to include diverse types of data including a portal, WormBase Parasite, which is dedicated to supporting helminth research. WormBase supports at least nine ‘core’ nematode species, hosting their genomes and providing tools such as genome browsers, including parasitic species that impact human health such as Onchocerca volvulus (causative agent of river blindness) and Strongyloides ratti (the rat laboratory analog of the causative agent of threadworm infection).

    Our user community not only includes Nobel prize winning researchers but also college and high school teachers who use C. elegans in their labs to teach basic biology and who use WormBase as an example of how to use a biological database. Our help-desk routinely fields questions from teachers and students who would like to obtain some worms and use WormBase to design simple experiments!

  9. WormBase actually encouraged users to sign a petition in support of the Model Organism Databases. What role does WormBase play in the Alliance of Genome Resources?
    WormBase is a member of the Alliance of Genome Resources (AGR) and is playing an active role in the Alliance’s plan to build an integrated data resource comprising of data from several model organisms, including yeast, worm, fly, zebrafish, mouse and rat. WormBase will help in the integration of data from these model organism databases in order to increase accessibility for the biomedical researcher to model organism data, from a single integrated resource. AGR will also serve as a portal to the individual model organism databases.

  11. What improvements are coming in the future for WormBase? For WormMine?
    WormBase is increasing focus on curating and improving displays for data relevant to human health which includes curating human gene orthologs in the worm that serve as genetic models for disease, and associating mutant alleles in C. elegans with orthologous genomic variants in human. In addition, WormBase Parasite will focus on features for the identification of putative targets for anti-helminthic drugs.

    WormBase is currently in the process of moving to a more flexible and dynamic database architecture. In the future we plan to do more frequent data updates (currently WormBase has a two month release cycle) and eventually updates in real time, so that Users can access new data as soon as they are curated. We plan on even greater involvement of the community through contributions of data–not just large-scale data, but small scale data from authors of individual papers, and edits to existing data in WormBase. We plan to make these community curated data available on the website.

    For WormMine: In the future we plan to include several more data sets in WormMine, that are currently in WormBase.

  13. Who is the team behind WormBase and WormMine?
    WormBase is an international consortium led by Paul Sternberg, Lincoln Stein, Paul Kersey and Matthew Berriman. It consists of three groups, each located in–the European Bioinformatics Institute (EBI) at Hinxton, UK, the Ontario Institute for Cancer Research (OICR) in Toronto, Canada, and the California Institute of Technology (Caltech) in Pasadena, USA. (See the WormBase 2016 reference below for the current list of people).

    The development of WormMine takes place at the Ontario Institute for Cancer Research under Lincoln Stein and Todd Harris. Currently, Paulo Nuin is the primary developer of WormMine.

Thanks to Ranjana, for guiding us through this extremely useful and FREE tool. Be sure to check out their plugin in the plugin library. If you use WormBase in your research, be sure to cite their recent publication:

WormBase 2016: expanding to enable helminth genomic research.
Kevin L. Howe, Bruce J. Bolt, Scott Cain, Juancarlos Chan, Wen J. Chen, Paul Davis, James Done, Thomas Down, Sibyl Gao, Christian Grove, Todd W. Harris, Ranjana Kishore, Raymond Lee, Jane Lomax, Yuling Li, Hans-Michael Muller, Cecilia Nakamura, Paulo Nuin, Michael Paulini, Daniela Raciti, Gary Schindelman, Eleanor Stanley, Mary Ann Tuli, Kimberly Van Auken, Daniel Wang, Xiaodong Wang, Gary Williams, Adam Wright, Karen Yook, Matthew Berriman, Paul Kersey, Tim Schedl, Lincoln Stein, Paul W. Sternberg (2016). Nucleic Acids Res, 44, D774-80. PMID: 26578572. PMCID: PMC4702863. DOI: 10.1093/nar/gkv1217

New mission available in Mark2Cure

Posted by on Aug 26, 2016 in citizen science, crowdsourcing, mark2cure | 0 comments

Just a quick update—a new mission has been launched.  This one is centered around galactosemia and oxidative stress.  Check it out today!

A huge thanks to everyone who contributed to finishing the mission on stress response and muscle weakness. You can investigate the knowledge networks of any completed mission by clicking on the ‘toggle network’ link on the mission page.

Check out the knowledge network you’ve built by contributing to the stress response and muscle weakness mission below!

Stress response and muscle weakness network

In case you didn’t know, there are talk pages available for each doc. The talk page for a doc becomes available to you after you complete that doc.  Please don’t hesitate to add your questions to the talk pages or to share your opinion on a talk page discussion.  This way, we all learn together!