Blog



BioThings- Other Biothings in the works

Posted by on May 21, 2018 in BioThings, mygene.info, myvariant.info | 0 comments

By now, you've probably seen the announcement that our renewal grant application has been approved. In addition to funding the improvement of MyGene.info, and the extension of lessons learned from MyVariant.info, the renewal grant will fund the development of a BioThings Software Development Kit (SDK). To our knowledge this BioThings SDK will be the first open-source bioinformatics software development kit (SDK) for building high performance web services. We WANT people to be able to build high performance web services for accessing important biological data so that everyone can get the most out of existing data.

Building the BioThings SDK

The development of the BioThings SDK will have three phases:

Phase 1- Abstraction--in this phase, we will extract the common codebase from MyGene and MyVariant to form the three core components (“databuild”, “web-API” and “cloud-deployment”) of the BioThings SDK.

Phase 2- Customization Tool Creation-- in this phase, we will build customization mechanisms into the SDK. These mechanisms will include a project-specific configuration system, a scheduler (for harvesting data from data source specific parsers), and a some generic interfaces for adding customization into each of the three core components (“databuild”, “web-API” and “cloud-deployment”).

Phase 3- Test and Improve-- in this phase, we test to ensure that the BioThings SDK works by converting the existing MyGene and MyVariant project code base to use the BioThings SDK. In converting the code base, we will identify issues with the SDK and areas of improvement. Once the SDK is in place, we may iteratively improve it either by converting codebase for other BioThings APIs (if they're created prior to the completion of the SDK) or by creating new BioThings APIs for chemicals or diseases.

Improving Data Provenance of BioThings APIs

One of the advantages of using BioThings APIs like MyGene or MyVariant is that it's always up-to-date. The BioThings team has taken care of the issues with data parsing and munging that can arise when individual resources are updated, so smoothly that users can sometimes be surprised/caught off guard when a data update changes their results.

To ensure that BioThings APIs handle data provenance well, we will build methods for data discrepancy and quality control; as well as, data update log recording and reporting directly into the BioThings SDK. Data provenance is important for reproducibility especially when working with continuously updated data resources, and the BioThings SDK will include methods for this important aspect of data management.

Improving Utility of BioThings APIs

By building the BioThings SDK, we invite others to create APIs like MyGene and MyVariant. APIs like MyGene and MyVariant make it easier for developers to create tools for utilizing the annotation data available via these APIs. This could potentially mean the growth of APIs like MyGene and MyVariant which creates a new and interesting quandary--how to know when data resource updates affect something of interest to you? To address this quandary, the BioThings team proposes developing a tool which is tentatively called BioReel which will be discussed in our next post.

Sneak Peek at Changes Coming to BioThings.io

In addition to overhauling the MyGene and MyVariant websites, the BioThings website will be getting a cosmetic upgrade as well. Here's a little taste of what's to come!
image of new biothings logo

MyVariant- Lesson’s learned and what’s in store

Posted by on May 14, 2018 in BioThings, JSON-LD, myvariant.info | 0 comments

In case you missed it, a sneak peek at what's in store for MyGene.info was posted last week, so it's only FAIR to share our plans for MyVariant. Although the development of MyVariant.info naturally followed that of MyGene.info, the scale of variant annotation data presented a difficult challenge that required additional architectural and performance considerations. At the time MyVariant.info was first being developed, there were about 18 million genes in the 6+ data resources of interest compared to 340 million variants in the 12+ data resources of interest--a ~20x scale up in the number of items to index.

For this reason, the development of MyVariant.info required a bit of tailoring to make it work. In overcoming the architectural and performance challenges, the MyVariant.info team members learned a lot of valuable lessons on abstracting and standardizing the creation of APIs like MyGene.info and MyVariant.info for different types (and scales) of biological entity data. To learn more about the general architecture behind MyGene and MyVariant services, check out the 2016 paper in Genome Biology.

The lessons learned on wrangling data of the scale that is handled by MyVariant.info will be valuable as the BioThings team looks towards incorporating data from Ensemble Genomes which will drastically scale up the amount of data offered by MyGene.info. In abstracting the process of building MyGene and MyVariant, the BioThings team has laid the foundation for building additional APIs centered around biological entities like chemicals and diseases!

Furthermore, the BioThings team will take the lessons learned and incorporate them into their efforts to create a generic Software Development Kit (SDK) for generating APIs around biological entities like genes and variants. More on the BioThings SDK later.

Variant annotation data is much more valuable in the context of genes; hence, the MyVariant team has been exploring ways to increase interoperability of the MyGene and MyVariant services using JSON-LD.

Linking data with JSON-LD

Both MyVariant.info and MyGene.info store annotation data from different resources in JSON documents; however, differences in keys for the same data across the two services can make it challenging to obtain results for chained queries. JSON-LD provides a standard way to add semantic context to the existing JSON data structure enhancing the interpretability and therefore interoperability of the JSON data.

Basically, each API (like MyGene and MyVariant) specifies a JSON-LD context (ie- a JSON document that can provide a Universal Resource Identifier (URI) mapping for each key in the output JSON document). The use of URIs provides consistency when specifying subjects and objects, allowing the results for a multistep chained query to be obtained through a much more simplified query.

Learn about how it works here and imagine the possibilities.

The anniversary of Mark2Cure’s official launch is coming soon

Posted by on May 11, 2018 in Cochrane Crowd, mark2cure, publications | 0 comments

Mark2Cure's 3rd anniversary is coming up, and we are extremely grateful for the opportunity to have interacted and learned from you over the last few years! You make this project interesting. You make this project interesting and exciting. You make this project educational and humbling. You make this project useful and valuable. Although our research team has shrunk two half of what it was when we first started, we have been able to continue to move forward only because of you! We cannot thank you enough!

As a citizen science effort, Mark2Cure is primarily driven by volunteers--and volunteers like you have brought us to where we are today. As of today, we have over 1.3 million annotations!!! We are currently busy with the analysis, so please accept my apologies for being a bit more slow to respond to your inquiries. Fortunately, you and your fellow volunteers continue to help us move forward. In fact, we are excited to share a new preprint on aligning citizen science opportunities with the needs of students fulfilling community service and service learning requirements. The research on these requirements was primarily performed by a volunteer with a marketing/business background and was inspired by a few high school Mark2Curators who have been kind enough to share their experience and needs as students and volunteers.

You can find the preprint on bioarxiv here, and it has been submitted for peer review in the journal Citizen Science Theory and Practice.

A designer has also wrapped up her work on making Mark2Cure more intuitive and user-friendly. A huge thanks to those of you who took the time to provide feedback on individual parts of her designs--although your feedback may not necessarily be incorporated in them here, we will definitely take your detailed and valuable suggestions into consideration.

Since this is citizen science and your voices are important--I'd like to share the designs with all of you. You can find the wire frames here. Note that the actual wording/content is subject to change (especially since we've received detailed content recommendations from some of you), and that the focus is more about the layout of the content. Please feel free to share your opinions about it with us!

For those of you who have joined us in last year's #MedLitBlitz or this year's #CitSciMedBlitz, you may be familiar with our friends at Cochrane Crowd. Like Mark2Cure, Cochrane Crowd is a citizen science project where volunteers help inspect biomedical abstracts. Cochrane crowd was also launched in May and are celebrating their anniversary with the #showyourscreen 2 Million annotations challenge. Learn more about the challenge here.

Sneak peek–what’s in store for MyGene.info

Posted by on May 7, 2018 in mygene, mygene.info | 0 comments

As of the most recent data update on April 24th to build version 20180422, the MyGene.info db grew to contain 22,132,511 documents. As a valuable service that has seen over 20 million requests in the last 30 days, MyGene.info was fortunate enough to receive renewed support for improving its offerings.

The MyGene.info landing page will be overhauled with a sleeker, more attractive, intuitive, responsive, cohesive, and user-friendly design. The updated landing site will reflect the ongoing improvements that were made to the website's architecture in the last few months as well as the expected changes in store for MyGene.info.

What's in store for MyGene.info

MyGene.info will expand to include highly-requested annotation sources such as the species and annotations available from Ensembl Genomes. Currently, Ensemble is already one of MyGene.info's ~7+ data resources, and contributes annotations for 1.6 million genes in >80 species. The inclusion of Ensemble Genomes can potentially add annotations for over 145 million genes from thousands of bacteria, fungi, plant, metazoa, and protists species!

In addition to this large and widely requested resource, MyGene.info will also import gene annotations from several smaller, more specialized data resources with the goal of making data from all of these resources more Findable, Accessible, Interoperative, and Reusable!

A FAIRly important note

Both the Su Lab and Wu Lab (to whom the renewal grant was awarded) are strong proponents of data re-use and have a strong interest in data FAIRness. How can MyGene.info, MyVariant.info, and both labs' related efforts make existing biological data more FAIR?

MyGene.info makes gene annotation data more Findable by providing a centralized resource that enables simple community contribution. All data sources included in MyGene.info are heavily indexed using existing identifiers, allowing that data to be acccessed via our simple search API. Since the data sources are pre-integrated into gene-specific JSON objects in MyGene.info, data from the included resources are standardized in structure and Accessible via our REST-based JSON API.

As part of the BioThings APIs, gene annotation data included in MyGene.info will be more Interoperable thanks to the compatibility with Linked Open Data resources using JSON-LD and standard vocabularies. Read more about the value of interoperable data in our recent paper on Cross-linking BioThings APIs via JSON-LD. By allowing community editing of the JSON-LD context files, we'll empower the community to iteratively improve the interoperability of the data.

Lastly, MyGene.info (and its sister BioThings APIs) will continue to help make data more Reusable by providing a high-performance, continuously-updated API with no authentication, registration, or usage limits. By providing R and python client libraries and encouraging the development of 3rd party clients, MyGene.info increases the accessibility and utility of the data.

Excited about what's in store for MyGene.info? If not, check be sure to check out the BioThings paper for a taste of the possibilities that are coming!

Happy Citizen Science Day Hero Badge Hunting!

Posted by on Apr 18, 2018 in challenges, citizen science | 0 comments

Citizen Science Day was April 14th this year, and Mark2Cure partnered with the San Diego Public Library to host a local Citizen Science Expo. Of course, many of our wonderful contributors are not in San Diego and could not attend the event. For those of you who wish to get in on the Citizen Science Day excitement, we've joined the EyesOnAlz Citizen Science Day Hero challenge.

The challenge will run until 9am ET, April 21st and anyone interested in the challenge will have the opportunity to earn digital badges for just trying out (ie- registering or logging into) different citizen science projects. As Mark2Curators, you only need to log into your Mark2Cure account to earn a badge.

Learn more about this fun challenge at https://blog.eyesonalz.com/citsciday-hero/.

Citizen Science Day 2018 is just around the corner!

Posted by on Mar 30, 2018 in citizen science, events | 0 comments

San Diego Citizen Science Day Expo

Citizen Science Day is on April 14th, this year and many citizen science organizations (including yours truly) are hosting citizen science events. Here in San Diego, we've teamed up with the San Diego Public Library and the Wet Lab group to put on the 3rd annual San Diego Citizen Science Day Expo. There are a lot of exciting new entrants into the San Diego citizen science scene, and we hope you will join us in learning about them at the expo. If you're in San Diego, please join us! The details are as follows:

Who: Anyone who wants to do science
What: San Diego Citizen Science Day Expo
When: Saturday, April 14, 2018. 1:00 PM – 5:00 PM
Where: North University Community Library (8820 Judicial Dr, San Diego, CA 92122)

Please note that the location has changed from the previous ones due to the limited availability of parking spots at the La Jolla Library. The North University Community Library has plentiful free parking, so please visit come if you're in the area! For the most up-to-date information about this event, visit SDCitSci.net.

If you're not in San Diego, there is probably an exciting Citizen Science Day event happening near you! To find a Citizen Science Day Event near you, visit Scistarter.com.

San Diego March for Science

The March for Science is also happening on April 14th in San Diego. It starts at the Waterfront park at 10:00am and ends at 1:00pm (right before our event!). If you want to show your love for science consider joining the march! If you want to DO science, be sure to join a Citizen Science Day event near you (or contribute to Mark2Cure, of course!).

Current Status of Mark2Cure

Development status and workarounds

Unfortunately, Mark2Cure no longer has a full time developer working on the project, so a lot of the issues and bugs that have been reported probably will not be fixed for a long, long time. We are very sorry for the frustration our system has caused our users and extremely grateful for the patience, graciousness, and encouragement our users have returned to us. Mark2Cure is really made up of a wonderful bunch of individuals, and we are thankful that this project has introduced us to you. Fortunately, many of you really put the science in the term citizen science and have systematically found ways to contribute productively in spite of all the issues in our system. You are all too amazing!

NER module issues: The most frustrating one has been the inability to highlight certain words, and the random highlighting/un-highlighting of words when users try to mark something. This has been reported by many users (many, many thanks to those of you who took the time to report this issue). Fortunately, one of your fellow volunpeers has found a workaround that appears to be quite robust. To get around a lot of these highlighting issues, AJ_Eckhart highlights the entire paragraph to remove the preannotations. These preannotations seem to be an important factor in this problem, and he has tested this workaround for the 'cannot-highlight-a-specific-term' bug, the 'highlighting-a-term-un-highlights-something-else', and the 'highlighting-a-term-highlights' something else' bugs.

RE module issues: A number of you have kindly taken the time to report issues with the RE module--the most common issue is the seemingly random inability to throw out an annotation. For this issue, two workarounds have been reported by our users. LadySteph has found that returning to the dashboard and then returning to the task will enable you to submit the response you wish (eg- throw out an annotation) and TAdams has reported that many of you have gravitated towards submitting 'Cannot be determined' in lieu of throwing out an annotation. We will take both workarounds into consideration when we analyze the data, so thank you all very much for contributing in spite of all these issues!

Data analysis and research status

Speaking of analyzing the data--we might not yet have enough abstracts annotated in order to generate ground-breaking, new hypotheses on NGLY1 deficiency, but we have enough for some initial analyses on the application of citizen science towards information extraction. We are working towards more scientific publications and look forward to sharing the results of your work and crediting you for your help. Note that many journal submission systems are not made to account for group names or a huge volume of names in the authorship; hence, we will continue to have our Mark2Cure contributors listed on a dedicated page which will be linked in the paper. As with our first paper, this will be an opt-in process because we respect your right to privacy. More details on opting-in will be sent via our mailing list.

CitSciMedBlitz Recap and Results

Posted by on Mar 9, 2018 in CitSciMedBlitz, Cochrane Crowd, EyesOnAlz, mark2cure | 0 comments

Hopefully, you'll all forgive us for the delay in announcing the results of #CitSciMedBlitz. Our partners at EyesOnAlz, Cochrane crowd, and ourselves were a little tired after the blitz and we gave ourselves a short break. Since the blitz, the EyesOnAlz team has generated the #CitSciMedBlitz digital badge for participants of all three challenges and has been working on creating the trophies; meanwhile, the Cochrane crowd team has contacted the overall #CitSciMedBlitz event winners.

look alive now! We're all up and active now and looking forward to sharing a recap and the results of #CitSciMedBlitz with you.

First, a Recap

On February 21st, CitSciMedBlitz was kicked off with a webinar to introduce the three platforms that would be participating in CitSciMedBlitz
You can watch the webinar here

The first challenge, the StallCatchers challenge, was launched on February 26th at 7am PST. Within minutes of the launch, StallCatchers were hammering away at StallCatchers, including Cochrane Crowd's Anna.
first 10 min

Anna managed to get on the leaderboard but wasn't able to stay on there for very long because the competition was just too tough. At least she placed, though-- I never even made it!

By the end of the challenge, StallCatchers analyzed a whopping 18,348 real videos--the equivalent of two weeks worth of laboratory analysis!
first 10 min

While the EyesOnAlz team was still cooling off from their intense challenge, we were gearing up for the Mark2Cure arm--and NOT without a huge set of worries.
i am worried

You can read more about the hiccups and snafu's that happened during the Mark2Cure 24hr challenge here. By the end of the challenge, CitSciMedBlitzers managed to submit ~300 doc annotations, and ~3000 relationship annotations--an impressive feat considering the increase in task difficulty, and the bugs and other technical issues that made the challenge--well, even more challenging!

While we were busy analyzing the results of our challenge, Cochrane crowd was preparing the last challenge of CitSciMedBlitz. This challenge started off right with 50 assessments within the first 2 minutes, and over 5000 in just the first 4 hours! This arm of the challenge would determine which of the top contenders from the previous challenges would win the trophy so the competition was intense! By the end of this challenge, over 46,000 classifications would be made--allowing our teams to determine the overall winner of #CitSciMedBlitz.

And now...the results of CitSciMedBlitz...

Of course, the biggest winner of the challenge goes to...

...biomedical and health evidence research! Everyone who has contributed (in the past) and continues to contributes to these efforts deserve a round of applause for being generous enough to donate their time towards helping with disease and health research.

Thank you for being amazing

The winner of the CitSciMedBlitz trophy is Michael Landau! Note, it was previously stated that Michael was also the top contributor to the EyesOnAlz challenge of CitSciMedBlitz. This is wrong.

Mike Capraro was the top contributor across all three measures of the EyesOnAlz challenge and the overall top contributor of that challenge for CitSciMedBlitz.

Editor's note #2: If you'd like to read more about CitSciMedBlitz from the EyesOnAlz team, check out their latest recap post here!

The top contributor to the Mark2Cure arm of CitSciMedBlitz was Kien Pong Yap, while the top contributor to the Cochrane Crowd arm of CitSciMedBlitz was Nikolaos. The top contributors to each platform will be receiving a platform-specific trophy.

Each platform will be awarding additional prizes to some of their top contributors and the notifications should be arriving via email (if they haven't already).

Editor's note #3, the recap from the Cochrane Crowd side of things is now available

Now that all the excitement of CitSciMedBlitz is over, I'd like to thank ALL the citizen scientists who contribute to projects like ours and make these platforms great. In citizen science, the people make the platform. And, as you can see--the people contributing to these platforms, do so with:

a collaborative spirit,
sharing is caring

humor,
guilt to the rescue

humility,
struggled and won

good spirit,
positive attitude for the win

and grace,
positive attitude for the win

Thank you all! Much respect for the work that you've done and the character you've shown!

mad respects

Note this post was updated on 2018.03.12 to correct an error and on 2018.03.19 to add a link

CitSciMedBlitz Update

Posted by on Mar 2, 2018 in CitSciMedBlitz, mark2cure | 0 comments

The results for Mark2Cure's arm of the CitSciBlitz are now available! Kien Pong Yap took the top spot in this challenge (see ranking at bottom of post).

What's exciting is that several of these names were also in the top 10 for the StallCatchers challenge so it'll be a tough competition in the Cochrane challenge to see who actually wins the CitSciMedBlitz triple challenge trophy.

Here are some random tidbits about the Mark2Cure arm of the CitSciMedBlitz triple challenge:

  • The NER module was broken for the first few hours of the challenge, and we received emails and phone calls about it within the first hour of the challenge. THANK YOU for bringing it to our attention and providing so detailed information about the bugs. Max was able to solve it while on the airplane ride to San Diego for the Future of Genomic Medicine Conference.

  • In spite of the NER module issues (a HUGE thanks to TAdams, JudyE, AJEckhart, and Itwontalwaysbelikethis for helping us troubleshoot), users doing this task still managed to climb the ranks even with the delayed start.

  • A Mark2Curator created an introductory video about the relationship extraction task for the event while we were running around trying to ensure that everything was in place for the M2C challenge. THANK YOU for doing that, TAdams!

  • One of the Mark2Curators joined the challenge as a personal challenge to raise awareness for dystonia on Rare Disease Day. Dystonia = loss of muscle control. There are many types, but you can imagine just how much more challenging that makes things! Read about her efforts here. And if you're curious how well she did--she made it to the top 5!

  • I was worried about following StallCatchers because their task is so visually appealing while ours is text based and quite difficult. Indeed we got questions and suggestions-a-plenty about Mark2Cure on the StallCatchers forum, but there are plenty of very tenacious citizen scientists in stall catchers and they managed to climb pretty high in the ranks.

  • The team behind StallCatchers is AMAZING!!!! Very generous with both support and humor! No wonder the StallCatchers are so ardent!

If you think it's all over--think again! The Cochrane crowd of the challenge has just started, and CitSciMedBlitzers who participate in all THREE challenges will get a digital badge on their StallCatchers profile. It's an awesome badge!

Now go take the cochrane challenge!

citscimedblitz mark2cure rankings

CitSciMed Blitz has started!

Posted by on Feb 26, 2018 in citizen science, Cochrane Crowd, events, EyesOnAlz, mark2cure, Rare Disease Day | 0 comments

It's on! The CitSciMedblitz week of challenges have started!

If you missed the webinar detailing the three biomedical/health citizen science research projects, it is available for viewing on youtube.
CitSciMedblitz webinar

You are welcome to participate in as many or as few of the challenges as you'd like, but a trophy will be awarded to the highest ranking participant across all THREE challenges. Read more about CitSciMedblitz from this post at citscibio.org

With regards to the challenges, up first (and going on now!) is the EyesOnAlz 24hr Catchathon. EyesOnAlz is an Alzheimer disease-focused citizen science project investigating stalled blood in brain images. It has a lot of cool images/videos in need of review by citizen scientists and a lot of fun features. The challenge has only just started and will run to 7am PST (3pm GMT) tomorrow (Feb. 28th) so get in on it ASAP!

The Mark2Cure challenge will start at 7am PST (3pm GMT) on Wednesday, February 28th. It is a doubly-special day because the 28th is Rare Disease Day and we have had an incredibly inspirational weekend at the Sanford Burnham Presby Rare Disease Day Symposium. We look forward to sharing rare disease stories from Mark2Curators and bringing awareness about these diseases as we tackle the literature around NGLY1 during this 24hr challenge.

Speaking of literature, our old friends at Cochrane Crowd are back with a lot of new features which you can explore during the Cochrane Screening Challenge. This challenge starts at 7am PST (3pm GMT) on Friday, March 2nd and runs for 24hrs.

CitSciMed Blitz, Rare Disease Day, and more

Posted by on Feb 2, 2018 in citizen science, Cochrane Crowd, events, EyesOnAlz, mark2cure, Rare diseaes | 0 comments

It's finally February which means it's time to prepare for Rare Disease Day 2018 and CitSciMedBlitz! This year's theme for Rare Disease Day continues off of last year's theme--research. According to RareDiseaseDay.org, patients are not only subjects but also proactive actors in research--and we couldn't agree more! Mark2Cure would not be where it is now without the inspiration, contributions, and drive from our partners and contributors in the rare disease community. Mark2Curators have inspired us with their generosity, perseverance, curiosity, and overall intellectual voraciousness--and for us, Rare Disease Day is an opportunity to share about the diseases that the Mark2Cure community cares about--and not just NGLY1-deficiency. If there is a disease that you care about that you'd like us to highlight for Rare Disease Day, please get in touch.

Patients are not only subjects but also proactive actors in research.
Patients kick start research
Patients drive research
Patients organize research
Patients proactively provide data

The increasing role of patients in research is not limited to Rare Disease
As citizen science becomes increasingly popular in biomedical research, patients and care providers are becoming increasingly important partners for disease research in general. And, as many of you have pointed out--we will all be patients at some point in our lives so it's nice to be able to actively contribute to disease research.

In addition to helping to organize the knowledge surrounding NGLY1-deficiency, patients and citizen scientists have been making important contributions to Alzheimer's disease research and contributing to health evidence--all of which brings us back to CitSciMed Blitz!

CitSciMed Blitz is coming

Similar to last year's MedLitBlitz, there will be prizes for the top contributors to all THREE platforms. Only participation during the 24hr challenges will count towards the prize, however, you are welcome to register and complete the training for the other platforms prior to the event if you'd like. Learn more about the event and the other platforms here.