Blog



CitSciMedBlitz Recap and Results

Posted by on Mar 9, 2018 in CitSciMedBlitz, Cochrane Crowd, EyesOnAlz, mark2cure | 0 comments

Hopefully, you'll all forgive us for the delay in announcing the results of #CitSciMedBlitz. Our partners at EyesOnAlz, Cochrane crowd, and ourselves were a little tired after the blitz and we gave ourselves a short break. Since the blitz, the EyesOnAlz team has generated the #CitSciMedBlitz digital badge for participants of all three challenges and has been working on creating the trophies; meanwhile, the Cochrane crowd team has contacted the overall #CitSciMedBlitz event winners.

look alive now! We're all up and active now and looking forward to sharing a recap and the results of #CitSciMedBlitz with you.

First, a Recap

On February 21st, CitSciMedBlitz was kicked off with a webinar to introduce the three platforms that would be participating in CitSciMedBlitz
You can watch the webinar here

The first challenge, the StallCatchers challenge, was launched on February 26th at 7am PST. Within minutes of the launch, StallCatchers were hammering away at StallCatchers, including Cochrane Crowd's Anna.
first 10 min

Anna managed to get on the leaderboard but wasn't able to stay on there for very long because the competition was just too tough. At least she placed, though-- I never even made it!

By the end of the challenge, StallCatchers analyzed a whopping 18,348 real videos--the equivalent of two weeks worth of laboratory analysis!
first 10 min

While the EyesOnAlz team was still cooling off from their intense challenge, we were gearing up for the Mark2Cure arm--and NOT without a huge set of worries.
i am worried

You can read more about the hiccups and snafu's that happened during the Mark2Cure 24hr challenge here. By the end of the challenge, CitSciMedBlitzers managed to submit ~300 doc annotations, and ~3000 relationship annotations--an impressive feat considering the increase in task difficulty, and the bugs and other technical issues that made the challenge--well, even more challenging!

While we were busy analyzing the results of our challenge, Cochrane crowd was preparing the last challenge of CitSciMedBlitz. This challenge started off right with 50 assessments within the first 2 minutes, and over 5000 in just the first 4 hours! This arm of the challenge would determine which of the top contenders from the previous challenges would win the trophy so the competition was intense! By the end of this challenge, over 46,000 classifications would be made--allowing our teams to determine the overall winner of #CitSciMedBlitz.

And now...the results of CitSciMedBlitz...

Of course, the biggest winner of the challenge goes to...

...biomedical and health evidence research! Everyone who has contributed (in the past) and continues to contributes to these efforts deserve a round of applause for being generous enough to donate their time towards helping with disease and health research.

Thank you for being amazing

The winner of the CitSciMedBlitz trophy is Michael Landau! Note, it was previously stated that Michael was also the top contributor to the EyesOnAlz challenge of CitSciMedBlitz. This is wrong.

Mike Capraro was the top contributor across all three measures of the EyesOnAlz challenge and the overall top contributor of that challenge for CitSciMedBlitz.

Editor's note #2: If you'd like to read more about CitSciMedBlitz from the EyesOnAlz team, check out their latest recap post here!

The top contributor to the Mark2Cure arm of CitSciMedBlitz was Kien Pong Yap, while the top contributor to the Cochrane Crowd arm of CitSciMedBlitz was Nikolaos. The top contributors to each platform will be receiving a platform-specific trophy.

Each platform will be awarding additional prizes to some of their top contributors and the notifications should be arriving via email (if they haven't already).

Editor's note #3, the recap from the Cochrane Crowd side of things is now available

Now that all the excitement of CitSciMedBlitz is over, I'd like to thank ALL the citizen scientists who contribute to projects like ours and make these platforms great. In citizen science, the people make the platform. And, as you can see--the people contributing to these platforms, do so with:

a collaborative spirit,
sharing is caring

humor,
guilt to the rescue

humility,
struggled and won

good spirit,
positive attitude for the win

and grace,
positive attitude for the win

Thank you all! Much respect for the work that you've done and the character you've shown!

mad respects

Note this post was updated on 2018.03.12 to correct an error and on 2018.03.19 to add a link

CitSciMedBlitz Update

Posted by on Mar 2, 2018 in CitSciMedBlitz, mark2cure | 0 comments

The results for Mark2Cure's arm of the CitSciBlitz are now available! Kien Pong Yap took the top spot in this challenge (see ranking at bottom of post).

What's exciting is that several of these names were also in the top 10 for the StallCatchers challenge so it'll be a tough competition in the Cochrane challenge to see who actually wins the CitSciMedBlitz triple challenge trophy.

Here are some random tidbits about the Mark2Cure arm of the CitSciMedBlitz triple challenge:

  • The NER module was broken for the first few hours of the challenge, and we received emails and phone calls about it within the first hour of the challenge. THANK YOU for bringing it to our attention and providing so detailed information about the bugs. Max was able to solve it while on the airplane ride to San Diego for the Future of Genomic Medicine Conference.

  • In spite of the NER module issues (a HUGE thanks to TAdams, JudyE, AJEckhart, and Itwontalwaysbelikethis for helping us troubleshoot), users doing this task still managed to climb the ranks even with the delayed start.

  • A Mark2Curator created an introductory video about the relationship extraction task for the event while we were running around trying to ensure that everything was in place for the M2C challenge. THANK YOU for doing that, TAdams!

  • One of the Mark2Curators joined the challenge as a personal challenge to raise awareness for dystonia on Rare Disease Day. Dystonia = loss of muscle control. There are many types, but you can imagine just how much more challenging that makes things! Read about her efforts here. And if you're curious how well she did--she made it to the top 5!

  • I was worried about following StallCatchers because their task is so visually appealing while ours is text based and quite difficult. Indeed we got questions and suggestions-a-plenty about Mark2Cure on the StallCatchers forum, but there are plenty of very tenacious citizen scientists in stall catchers and they managed to climb pretty high in the ranks.

  • The team behind StallCatchers is AMAZING!!!! Very generous with both support and humor! No wonder the StallCatchers are so ardent!

If you think it's all over--think again! The Cochrane crowd of the challenge has just started, and CitSciMedBlitzers who participate in all THREE challenges will get a digital badge on their StallCatchers profile. It's an awesome badge!

Now go take the cochrane challenge!

citscimedblitz mark2cure rankings

CitSciMed Blitz has started!

Posted by on Feb 26, 2018 in citizen science, Cochrane Crowd, events, EyesOnAlz, mark2cure, Rare Disease Day | 0 comments

It's on! The CitSciMedblitz week of challenges have started!

If you missed the webinar detailing the three biomedical/health citizen science research projects, it is available for viewing on youtube.
CitSciMedblitz webinar

You are welcome to participate in as many or as few of the challenges as you'd like, but a trophy will be awarded to the highest ranking participant across all THREE challenges. Read more about CitSciMedblitz from this post at citscibio.org

With regards to the challenges, up first (and going on now!) is the EyesOnAlz 24hr Catchathon. EyesOnAlz is an Alzheimer disease-focused citizen science project investigating stalled blood in brain images. It has a lot of cool images/videos in need of review by citizen scientists and a lot of fun features. The challenge has only just started and will run to 7am PST (3pm GMT) tomorrow (Feb. 28th) so get in on it ASAP!

The Mark2Cure challenge will start at 7am PST (3pm GMT) on Wednesday, February 28th. It is a doubly-special day because the 28th is Rare Disease Day and we have had an incredibly inspirational weekend at the Sanford Burnham Presby Rare Disease Day Symposium. We look forward to sharing rare disease stories from Mark2Curators and bringing awareness about these diseases as we tackle the literature around NGLY1 during this 24hr challenge.

Speaking of literature, our old friends at Cochrane Crowd are back with a lot of new features which you can explore during the Cochrane Screening Challenge. This challenge starts at 7am PST (3pm GMT) on Friday, March 2nd and runs for 24hrs.

CitSciMed Blitz, Rare Disease Day, and more

Posted by on Feb 2, 2018 in citizen science, Cochrane Crowd, events, EyesOnAlz, mark2cure, Rare diseaes | 0 comments

It's finally February which means it's time to prepare for Rare Disease Day 2018 and CitSciMedBlitz! This year's theme for Rare Disease Day continues off of last year's theme--research. According to RareDiseaseDay.org, patients are not only subjects but also proactive actors in research--and we couldn't agree more! Mark2Cure would not be where it is now without the inspiration, contributions, and drive from our partners and contributors in the rare disease community. Mark2Curators have inspired us with their generosity, perseverance, curiosity, and overall intellectual voraciousness--and for us, Rare Disease Day is an opportunity to share about the diseases that the Mark2Cure community cares about--and not just NGLY1-deficiency. If there is a disease that you care about that you'd like us to highlight for Rare Disease Day, please get in touch.

Patients are not only subjects but also proactive actors in research.
Patients kick start research
Patients drive research
Patients organize research
Patients proactively provide data

The increasing role of patients in research is not limited to Rare Disease
As citizen science becomes increasingly popular in biomedical research, patients and care providers are becoming increasingly important partners for disease research in general. And, as many of you have pointed out--we will all be patients at some point in our lives so it's nice to be able to actively contribute to disease research.

In addition to helping to organize the knowledge surrounding NGLY1-deficiency, patients and citizen scientists have been making important contributions to Alzheimer's disease research and contributing to health evidence--all of which brings us back to CitSciMed Blitz!

CitSciMed Blitz is coming

Similar to last year's MedLitBlitz, there will be prizes for the top contributors to all THREE platforms. Only participation during the 24hr challenges will count towards the prize, however, you are welcome to register and complete the training for the other platforms prior to the event if you'd like. Learn more about the event and the other platforms here.

Gene Wiki Data, BioThings, Mark2Cure, & more! A review of 2017 in the Su Lab

Posted by on Jan 1, 2018 in annual summary, BD2K, BioGPS, BioThings, mark2cure, Su lab, sulab, tsri | 0 comments

Rather than summarize the 2017 progress on in each project in separate, project-specific posts, I’m putting it all in once post this year for one important reason–recruitment! As with any academic research group, we expect to see a number of our talented team members move on to bigger and better things. 2017 did not disappoint (well, it was disappointing to lose such talent, but we’re also happy that these awesome people are finding amazing new opportunities).

In saying farewell to 2017, we also bid farewell and best wishes to team members who have really pushed the boundaries of science during their time here including:

  • Ramya Gamini–who takes her bioinformatics expertise to Pfizer where she will continue as a postdoctoral associate.
  • Tim Putman–who takes his bioinformatics chops to Oregon Health & Science University (OHSU) as a Bioinformatics Software Developer. There, he will continue to push for open data and data reuse as he works on developing the Monarch web application, aggregate and structure microbial model organism data in The Monarch Initiative Infrastructure, and develop tools and aggregate data in support of the NCATS Translator Project.
  • Benjamin Good–the talented Assistant Professor who drove the Gene Wiki / Wiki Data projects in the Su Lab as well as a number of citizen science projects such as Science Game Lab, theCure, Mark2Cure, and more!
  • Max Nanis–research programmer, developer, artist. His twitter feed is like a tribute to chaos. His success anywhere is a certainty.
  • Sebastian Burgstaller-Muehlbacher–his profile on the Su Lab site will forever remain a fictional heavy metal tribute to science–Rock on!!!

Losing so much talent in 2017, we’re fortunate enough to recruit some new R|D rock stars to the lab and are happy to welcome:

  • Byung Ryul Jeon–a visiting scientist, accomplished doctor, and scholar interested in broadening his knowledge and honing his research skills
  • Laura Hughes–data scientist and web developer with a knack for purposeful and deliberate data visualization. Laura came to us after applying her skills to make the world a better place at her previous job with USAID.
  • Alejo Covian–a full stack python developer shrouded in mystery.

Team members like Associate Professor Chunlei Wu, not only provides friendly expertise, and helpful guidance to new recruits–he’s also a leader when it comes to holiday fashion.

If you have some computational skills and would like to use them to push research forward, consider joining our lab! We have a lot of great projects which could use your help! Speaking of great projects, let’s start with the ones led by Chunlei (pictured left).

Chunlei has been the driving force behind BioGPS, MyGene.info, MyVariant.info, BioThings.io, and more. As recently announced, he will be the TSRI site principle investigator for the National Center for Data to Health (CD2H).

The newly created center will be led by researchers from OHSU (Tim has moved on from the Su Lab, but he hasn’t escaped our reach!!!!), Northwestern University, University of Washington, Johns Hopkins University School of Medicine, and Sage Bionetworks, together with TSRI, Washington University in St. Louis, the University of Iowa and The Jackson Laboratory.

For his part, Chunlei and his team will do what they do best–building high-performance and scalable data access infrastructure and defining community best practice for data processing and software implementations.
 
 

As of 2017, BioGPS has seen the launch of a new sheep atlas portal along with a corresponding research paper, and BioGPS made it to the top of the weekly list on Labworm. After demonstrating that the MyGene.info framework could be retooled for for Variants (MyVariant.info), the framework has been abstracted into the BioThings SDK. 2017 was a busy year for these related projects:

  • Kevin presented about MyVariant.info at Heart BD2K site visit (2017.04.20)
  • Chunlei presented to Global Alliance for Genomics and Health (GA4GH), Variant Interpretation for Cancer Consortium (VICC) working group (2017.05.09)
  • Chunlei presented a poster on the BioThings SDK at ISMB/BOSC (2017.07.21 – 2017.07.25)
  • Kevin presented a poster on the BioThings Explorer at ISMB/BOSC
  • And the BioThings Explorer manuscript titled, “Cross-linking BioThings APIs through JSON-LD to facilitate knowledge exploration”, has been submitted

2017 has also been busy for the Gene Wiki / Wiki Data team here in the Su Lab as:

  • the Gene Wiki / Wiki Data Team shared their work at Biocuration 2017 conference (03.26.2017 – 03.29.2017). Both Sebastian & Tim gave talks, while Greg and Nuria had poster presentations.
  • Greg also presented at Heart BD2K site visit (2017.04.20)
  • Andrew presented at Bioinformatics Open Source Conference (2017.07.22 – 2017.07.23)
  • and Andrew was also featured on the Wikimedia Research Showcase Webcast (2017.08.23)

In case you missed it, the renewal grant application for this project was also shared in a blog post this year which gives the big picture idea of the project moving forward.

While we’re on the subject of crowdsourced science, Mark2Cure had a busy year as well.

  • Mark2Cure hosteds #Mark4Rare event for Rare Disease Day (during the week prior to rare disease day)
  • Mark2Cure organized/hosted the Citizen Science Expo at La Jolla Library (2017.03.11)
  • Ginger presented about Mark2Cure at Heart BD2K site visit (2017.04.20)
  • Mark2Cure was featured on SciStarter (2017.04.27)
  • Mark2Cure had a joint webinar with Cochrane Crowd (2017.05.08) followed by a joint event (#MedLitBlitz) which included a Mark2Curathon (2017.05.11)
  • Ginger presented Mark2Cure at the 2017 Citizen Science Association Conference during the citizen science for biomedical research panel (2015.05.18)
  • Ginger presented a poster on Mark2Cure at the CSA 2017 conference (2015.05.18)
  • Max delivered a Project Slam about Mark2Cure at the CSA 2017 conference (2015.05.18)
  • Mark2Cure joined the CitSciBio at table in St. Paul Science Museum for Citizen Science Festival (2015.05.20)
  • Mark2Cure paneled for #CitSciChat (2017.07.19)
  • Mark2Cure joined #Dazzle4rare 2017.08.13 – 2017.08.19

In addition to these projects, members of the Su Lab have been busy advancing research in protein folding/microscopy analysis methods, osteoarthritis, and data-driven drug re-purposing methodologies–all while managing to have an epic time.

May the FAIR-principles-of-open-data be with you

New default behavior for ‘species’ parameter

Posted by on Nov 24, 2017 in mygene.info, species | 0 comments

MyGene.info API supports both "/gene" and "/query" endpoints. On its /query endpoint, an optional species parameter allows users to pass one or multiple species (as common species names or taxonomy ids) to filter down the query results.

Previously, the default species were set to "human,mouse,rat". This meant that, unless you explicitly specified other values for the species parameter, your query results (e.g. "q=cdk2") might look like this:

http://mygene.info/v3/query?q=cdk2

{
  "max_score": 457.24393,
  "took": 11,
  "total": 32,
  "hits": [
    {
      "_id": "1017",
      "_score": 457.24393,
      "entrezgene": 1017,
      "name": "cyclin dependent kinase 2",
      "symbol": "CDK2",
      "taxid": 9606
    },
    {
      "_id": "12566",
      "_score": 329.98914,
      "entrezgene": 12566,
      "name": "cyclin-dependent kinase 2",
      "symbol": "Cdk2",
      "taxid": 10090
    },
    {
      "_id": "362817",
      "_score": 279.2216,
      "entrezgene": 362817,
      "name": "cyclin dependent kinase 2",
      "symbol": "Cdk2",
      "taxid": 10116
    },
    {
      "_id": "143384",
      "_score": 22.91444,
      "entrezgene": 143384,
      "name": "CDK2 associated cullin domain 1",
      "symbol": "CACUL1",
      "taxid": 9606
    },
    {
      "_id": "52004",
      "_score": 20.558783,
      "entrezgene": 52004,
      "name": "CDK2-associated protein 2",
      "symbol": "Cdk2ap2",
      "taxid": 10090
    },
    {
      "_id": "78832",
      "_score": 17.98903,
      "entrezgene": 78832,
      "name": "CDK2 associated, cullin domain 1",
      "symbol": "Cacul1",
      "taxid": 10090
    },
    {
      "_id": "365493",
      "_score": 14.489841,
      "entrezgene": 365493,
      "name": "CDK2-associated, cullin domain 1",
      "symbol": "Cacul1",
      "taxid": 10116
    },
    {
      "_id": "13445",
      "_score": 13.166027,
      "entrezgene": 13445,
      "name": "CDK2 (cyclin-dependent kinase 2)-associated protein 1",
      "symbol": "Cdk2ap1",
      "taxid": 10090
    },
    {
      "_id": "690181",
      "_score": 8.355364,
      "entrezgene": 690181,
      "name": "similar to S-phase kinase-associated protein 1A (Cyclin A/CDK2-associated protein p19) (p19A) (p19skp1)",
      "symbol": "LOC690181",
      "taxid": 10116
    },
    {
      "_id": "690646",
      "_score": 7.2449207,
      "entrezgene": 690646,
      "name": "similar to S-phase kinase-associated protein 2 (F-box protein Skp2) (Cyclin A/CDK2-associated protein p45) (F-box/WD-40 protein 1) (FWD1)",
      "symbol": "LOC690646",
      "taxid": 10116
    }
  ]
}

With no species parameter specified in the query, 32 hits were returned corresponding to all genes from species "human, mouse, rat" with a match to cdk2 in some fields (like symbol, name fields etc.). You could return the matched genes from all species by specifying species=all in the query.

While "human,mouse,rat" was a useful default for users who just need to query genes in these common species, it may cause some confusion for those query terms only relevant to non-"human/mouse/rat" species. For example, previously, a query like q=F1RW06 returns no hits instead of the matching pig CDK3 gene, unless you add "species=pig" or "species=all".

Now, based on many user feedbacks, the default "species" behavior has been set to "all". The same "q=cdk2" query will now return matched genes from all species:

http://mygene.info/v3/query?q=cdk2

{
  "max_score": 393.0346,
  "took": 115,
  "total": 611,
  "hits": [
    {
      "_id": "1017",
      "_score": 393.0346,
      "entrezgene": 1017,
      "name": "cyclin dependent kinase 2",
      "symbol": "CDK2",
      "taxid": 9606
    },
    {
      "_id": "12566",
      "_score": 327.42117,
      "entrezgene": 12566,
      "name": "cyclin-dependent kinase 2",
      "symbol": "Cdk2",
      "taxid": 10090
    },
    {
      "_id": "362817",
      "_score": 270.2593,
      "entrezgene": 362817,
      "name": "cyclin dependent kinase 2",
      "symbol": "Cdk2",
      "taxid": 10116
    },
    {
      "_id": "100925631",
      "_score": 268.31903,
      "entrezgene": 100925631,
      "name": "cyclin dependent kinase 2",
      "symbol": "CDK2",
      "taxid": 9305
    },
    {
      "_id": "100981695",
      "_score": 268.31903,
      "entrezgene": 100981695,
      "name": "cyclin dependent kinase 2",
      "symbol": "CDK2",
      "taxid": 9597
    },
    {
      "_id": "105864946",
      "_score": 268.31903,
      "entrezgene": 105864946,
      "name": "cyclin dependent kinase 2",
      "symbol": "CDK2",
      "taxid": 30608
    },
    {
      "_id": "ENSMEUG00000005552",
      "_score": 268.31903,
      "name": "cyclin dependent kinase 2",
      "symbol": "CDK2",
      "taxid": 9315
    },
    {
      "_id": "103465316",
      "_score": 268.31903,
      "entrezgene": 103465316,
      "name": "cyclin dependent kinase 2",
      "symbol": "cdk2",
      "taxid": 8081
    },
    {
      "_id": "100117828",
      "_score": 268.31903,
      "entrezgene": 100117828,
      "name": "cyclin dependent kinase 2",
      "symbol": "Cdk2",
      "taxid": 7425
    },
    {
      "_id": "101544122",
      "_score": 268.31903,
      "entrezgene": 101544122,
      "name": "cyclin dependent kinase 2",
      "symbol": "CDK2",
      "taxid": 42254
    }
  ]
}

We think this changed default behavior for "species" parameter will give more
intuitive results for most of users. And you can easily mimic the old behavior by explicitly specifying species=human,mouse,rat in the query. It's also worth mentioning that, as before, our customized weighting function makes sure that the human, mouse, and rat genes with the same matches (e.g. the same symbol match of "cdk2") are always appear first comparing to those from other species.

As always, let us know if you have any comments or concerns via help@mygene.info or @mygene.info.

BioGPS Spotlight on the Sheep Gene Expression Atlas

Posted by on Nov 6, 2017 in BioGPS, data release, spotlight | 0 comments

In BioGPS, there are a number of Interminebased model organism plugins (and to a lesser extent model organism data sets) which allow users to explore gene expression in organisms typically studied in biomedical research. Model organisms such as mice, rats, flies, worms, zebra fish, etc. have well-annotated genomes and a lot of well-established tools for further exploring and contributing to the knowledgebase around those animals. In contrast, valuable agricultural animals do not have this degree of data, tools, and resource development. This may change as the biomedical and agricultural research domains blur thanks to the movement of medication and infectious disease from farm animals into humans. In this spotlight, we’re happy to introduce a new data set that’s been added to BioGPS–the Sheep Gene Expression Atlas. Emily Clark, a researcher and Chancellor’s Fellow from The Roslin Institute, University of Edinburgh, kindly answered our questions.

  1. In one tweet or less, introduce us to the Sheep Gene Expression Atlas:
    A high resolution atlas of gene expression across tissues and cell types in sheep.
  2.  

  3. Who is your target audience? How big is the community studying sheep genetics?
    Our target audience is the livestock research community, particularly those working on small ruminants. There is a large research community studying sheep genetics with research groups across the globe and an International Sheep Genomics Consortium (ISGC). The project is also a valuable resource for the Functional Annotation of Animal Genomes Consortium (FAANG) and represents the largest RNA-Seq FAANG dataset to date. Sheep are also an important non-human model and we hope the data will be useful for the mammalian genomics community more generally.
  4.  

  5. It looks like the academic article on the Sheep Gene Expression Atlas was published a little more than a month ago in PLOS Genetics. How long has the team been working on the atlas before reaching this point?
    The sheep gene expression atlas was initiated in 2013, so we have been working on it for approximately 4 years. The first year involved tissue collection then the following years, library preparation and data analysis.
  6.  

  7. In your paper, you illustrate the value of the Sheep Gene Expression Atlas by looking at Innate Immunity genes and the advantages of crossbreeding. What other types of research could this atlas contribute to? Antibody development for immunological assays? Prion disease research? Antibiotic use in animal husbandry?
    We hope that the atlas will now be used by researchers working in livestock genetics and genomics to link genotype to phenotype. It has potential uses in identifying targets for novel therapeutics, some of the dataset from the sheep expression atlas project has been used to identify genes relevant to resistance to mastitis, for example (Banos et al. 2017 The Genomic Architecture of Mastitis Resistance, BMC Genomics). Researchers at the Roslin Institute, interested in prion disease, are also looking at the expression of the gene PRNP (prion protein) across tissues using the sheep atlas dataset. The scale and scope of the dataset is such that it should contribute and provide information for multiple research projects and different fields in sheep but also other ruminants and livestock.
  8.  

  9. Who is the team behind the Sheep Gene Expression Atlas?
    The sheep atlas project was led by David Hume and Alan Archibald who initiated the work. It was coordinated by Emily Clark, with bioinformatic support from Stephen Bush. The project involved a large team of people for sample collection at The Roslin Institute including farm technicians who also managed the animals for the project. We are also very grateful to Chunlei Wu and Cyrus Afrasiabi for their help making the sheep atlas dataset visualisable on the BioGPS platform.
  10.  

  11. What is in store for the Sheep Gene Expression Atlas?
    Next we hope to use the data set for a global analysis of allele specific expression across tissues and cell types in sheep and we also have a comparative analysis of gene expression from a smaller subset of tissues in goat which we hope to release soon.

Thanks to Emily Clark, and the rest of the the Sheep Gene Expression Atlas team, for sharing their high resolution Sheep Gene Expression Atlas with BioGPS. If you use the Sheep Gene Expression Atlas data set in your research, be sure to cite their publication:

Clark EL, Bush SJ, McCulloch MEB, Farquhar IL, Young R, Lefevre L, et al. (2017) A high resolution atlas of gene expression in the domestic sheep (Ovis aries). PLoS Genet13(9): e1006997. https://doi.org/10.1371/journal.pgen.1006997

To search for your favorite genes in the Sheep Gene Expression Atlas, visit the sheep-specific portal at: http://biogps.org/sheepatlas/#goto=welcome

Prion protein expression in the Sheep Gene Expression Atlas in BioGPS

Happy Birthday Wikidata!

Posted by on Oct 26, 2017 in Wikidata | 0 comments

On Wikidata’s fifth birthday, we (the Gene Wiki team) offer our hearty congratulations!! It is amazing what has been achieved in such a short timespan. Wikidata has basically given us – and the larger research community – the gift of not having to maintain a core knowledge infrastructure. It has been taken care of (i.e. millions of SPARQL queries daily), so the research community can now focus on its core task, doing research.

Our project – the Gene Wiki project – started in 2008 with the objective to seed Wikipedia with high quality basic biomedical facts with the goal of crowdsourcing a gene-specific review article for every human gene. With the birth of Wikidata in 2012, we shortly after shifted our focus from Wikipedia to Wikidata. On Oct 6, 2014, we had our first milestone: all human genes had entities in Wikidata.

Since then, we have continued enriching Wikidata with not only gene annotations from other species, but also extended the coverage to related concepts such as diseases, drugs, chemical compounds and other related concepts. We have developed a python library (Wikidata Integrator), which started as a biomedical library but is now applied in other domain areas.

We view the current landscape of biomedical data in Wikidata as basically consisting of three layers. The first layer is those resources which our team has directly loaded. We have focused on resources that are the most commonly used by researchers to form a solid foundation of biomedical knowledge. The second layer is formed by partner organizations with whom we’ve collaborated to help bring their resources into Wikidata. These partners bring key new data types, including information on genetic variants (from CIViC) and on biological pathways (from Wikipathways and Reactome). And finally, we are perhaps most excited when we discover efforts that are completely independent in origin but highly synergistic in our mission. This group includes James Hare’s effort to load environmental exposures from the CDC, and the amazing Wikicite team for loading bibliographic data from the scientific literature.

The sum total of all this work is a richly interconnected network of open biomedical knowledge. And this network enables us to ask and answer an impressively diverse set of biomedical questions (a growing list is documented at https://www.wikidata.org/wiki/User:ProteinBoxBot/SPARQL_Examples).

The genewiki landscape with its three layers.

Looking ahead and as a birthday present, we can lift a corner of the veil on our imminent developments.

To improve the robustness we are developing stronger feedback loops to experts curating primary sources. These feedback loops are based on validation reports such as the already existing constraint violations, but we are also looking into more complex constraint patterns where multiple statements are validated together using Shape Expressions. Currently, our bots are running on a continuous integration platform called Jenkins, we are working towards more automation of our efforts, such as driving the feedback loops and quality control.

We are excited to continue our work to make Wikidata the most comprehensive hub for open and linked biomedical data!

New MyVariant.info data release log and new data updates

Posted by on Sep 25, 2017 in clinvar, data release, dbnsfp, myvariant.info, snpeff, uniprot | 0 comments

Don't want to look through our blog posts to find previous information about data updates on MyVariant.info? Now you don't have to! Metadata about our data updates is now being logged on in our docs at http://docs.myvariant.info/en/latest/doc/release_changes.html. Hence, from here on out, you can find the most up-to-date metadata on our data releases in our docs. These updates will be in the same easy-to-compare tables that you've seen in our blog posts. If you'd like the most-recent metadata in json, you can get it from our metadata endpoint. Furthermore, you can obtain the most recent, assembly-specific metadata by specifying: assembly=hg38 or assembly=hg19 as in this example for hg38.

Data updates as of September 7, 2017

While we're on the topic of new data releases, here are the most recent updates for GRCh37/hg19 variants:

last release new release # of variants
in last release
# of variants
in new release
ClinVar 2017-08 2017-09 310,349 316,940
dbnsfp 3.4a 3.5a 82,366,524 82,366,524
grasp 2.0.0.0 2.0.0.0 2,473,750 2,651,542
snpeff 4.3k 4.3k 424,568,367 581,983,125

And here are the updates for GRCh38/hg38 variants:

last release new release # of variants
in last release
# of variants
in new release
ClinVar 2017-08 2017-09 310,539 317,142
dbnsfp 3.4a 3.5a 82,443,748 82,443,748
snpeff 4.3k 4.3k 413,236,533 413,237,509
uniprot 2017-03 2017-07 527,607 527,607

As you can see, we've updated data from ClinVar, dbnsfp, grasp, snpeff, and uniprot. Visit and bookmark the MyVariant.info data release log and stay current on the newest MyVariant.info data releases.

The Sammies award and why it matters to Mark2Cure

Posted by on Sep 8, 2017 in citizen science, GenBank, mark2cure | 0 comments

In case you haven't heard, David Lipman and the GenBank team are in the running for the People's Choice Award of the Samuel J. Heyman Service to America Medals (#Sammies2017). Although Lipman and the GenBank team weren't featured in Medium.com or other news sources, they still made it to the final four.

At this point, many of you may be wondering why we're even talking about Lipman and the GenBank team on a discussion venue meant for Mark2Cure. Mark2Cure is a citizen science project that deals in biomedical literature, and doesn't involve BLAST or Lipman or GenBank, right?

But, when you think about how much of scientific progress is incremental, you begin to appreciate the impressive volume of preceding work. This is especially true if you work on a project like Mark2Cure.

Mark2Cure aims to enable citizen scientists to help mine information from the biomedical literature, which means that Mark2Cure would NOT exist if there wasn't a massive volume of preceding and ongoing work in biomedical research. We've been able to build Mark2Cure because key information infrastructure was already in place--PubMed. Lipman launched PubMed in 1997 followed by PubMed Central in 2000. Without PubMed and the subsequent tools built for utilizing PubMed, identifying abstracts and pulling them into Mark2Cure would be more difficult.
As expected, PubMed now has over 27 million articles, up from over 26 million earlier this year Interestingly enough, Lipman's and the GenBank's team nomination for the 2017 Sammies only cursorily mention PubMed Central in favor of focusing on GenBank and his contributions to infectious disease surveillance. Perhaps describing their work this way made it more accessible to anyone not in biomedical research. Unfortunately, their profile description doesn't adequately convey how important the infrastructure they've built is to modern biomedical research in the US, open science, and Mark2Cure.

Because the Mark2Cure community consists of people who've been impacted by Lipman and the GenBank team's work, I'll spell it out here:

For members of our community who like science and like being able to read scientific articles: PubMed Central (PMC) has been a central repository for research articles that ANYONE can access and read. Thanks to NIH leadership, publications resulting from research supported by the NIH must be deposited to PMC.

For members of our community who are afflicted or know someone who is afflicted by a rare genetic disorder: GenBank has been a central repository for DNA sequences and BLAST has been an important means of searching those sequences. Without a central repository for DNA sequences, it would be a lot more difficult for researchers to map and annotate functionality associated with those sequences, to draw comparisons on protein function across the different model organisms, and most importantly, to build on each other's work. Much of what we know (or will know) about rare disease genes or proteins comes from (or will come from) expanding on the work of researchers studying worms (or flies, mice, frogs, fish, and more) thanks to the knowledge sharing enabled by PubMed and GenBank.

For the members of our community who just like to help: Mark2Cure exists because of the sheer volume of incremental progress that is represented by the publication of biomedical research articles. Incremental progress isn't as exciting or fun to talk about as scientific 'breakthroughs', but in science a lot of incremental progress had to happen in order for these 'breakthroughs' to follow.

There is so much to sift through, and every contribution from our citizen scientists unlocks a bit more information buried in the text. The Mark2Cure dream is that in unlocking information from the text, you will be able to help with 'breakthroughs' in disease research.

Although I've been rambling about the importance of Lipman and the GenBank team's work to modern biomedical research, Mark2Cure would be nothing without the community of citizen scientists that contribute to it. In no way should this discussion of Lipman and team detract from this fact.