Gene Wiki Data, BioThings, Mark2Cure, & more! A review of 2017 in the Su Lab

Posted by on Jan 1, 2018 in annual summary, BD2K, BioGPS, BioThings, mark2cure, Su lab, sulab, tsri | 0 comments

Rather than summarize the 2017 progress on in each project in separate, project-specific posts, I’m putting it all in once post this year for one important reason–recruitment! As with any academic research group, we expect to see a number of our talented team members move on to bigger and better things. 2017 did not disappoint (well, it was disappointing to lose such talent, but we’re also happy that these awesome people are finding amazing new opportunities).

In saying farewell to 2017, we also bid farewell and best wishes to team members who have really pushed the boundaries of science during their time here including:

  • Ramya Gamini–who takes her bioinformatics expertise to Pfizer where she will continue as a postdoctoral associate.
  • Tim Putman–who takes his bioinformatics chops to Oregon Health & Science University (OHSU) as a Bioinformatics Software Developer. There, he will continue to push for open data and data reuse as he works on developing the Monarch web application, aggregate and structure microbial model organism data in The Monarch Initiative Infrastructure, and develop tools and aggregate data in support of the NCATS Translator Project.
  • Benjamin Good–the talented Assistant Professor who drove the Gene Wiki / Wiki Data projects in the Su Lab as well as a number of citizen science projects such as Science Game Lab, theCure, Mark2Cure, and more!
  • Max Nanis–research programmer, developer, artist. His twitter feed is like a tribute to chaos. His success anywhere is a certainty.
  • Sebastian Burgstaller-Muehlbacher–his profile on the Su Lab site will forever remain a fictional heavy metal tribute to science–Rock on!!!

Losing so much talent in 2017, we’re fortunate enough to recruit some new R|D rock stars to the lab and are happy to welcome:

  • Byung Ryul Jeon–a visiting scientist, accomplished doctor, and scholar interested in broadening his knowledge and honing his research skills
  • Laura Hughes–data scientist and web developer with a knack for purposeful and deliberate data visualization. Laura came to us after applying her skills to make the world a better place at her previous job with USAID.
  • Alejo Covian–a full stack python developer shrouded in mystery.

Team members like Associate Professor Chunlei Wu, not only provides friendly expertise, and helpful guidance to new recruits–he’s also a leader when it comes to holiday fashion.

If you have some computational skills and would like to use them to push research forward, consider joining our lab! We have a lot of great projects which could use your help! Speaking of great projects, let’s start with the ones led by Chunlei (pictured left).

Chunlei has been the driving force behind BioGPS,,,, and more. As recently announced, he will be the TSRI site principle investigator for the National Center for Data to Health (CD2H).

The newly created center will be led by researchers from OHSU (Tim has moved on from the Su Lab, but he hasn’t escaped our reach!!!!), Northwestern University, University of Washington, Johns Hopkins University School of Medicine, and Sage Bionetworks, together with TSRI, Washington University in St. Louis, the University of Iowa and The Jackson Laboratory.

For his part, Chunlei and his team will do what they do best–building high-performance and scalable data access infrastructure and defining community best practice for data processing and software implementations.

As of 2017, BioGPS has seen the launch of a new sheep atlas portal along with a corresponding research paper, and BioGPS made it to the top of the weekly list on Labworm. After demonstrating that the framework could be retooled for for Variants (, the framework has been abstracted into the BioThings SDK. 2017 was a busy year for these related projects:

  • Kevin presented about at Heart BD2K site visit (2017.04.20)
  • Chunlei presented to Global Alliance for Genomics and Health (GA4GH), Variant Interpretation for Cancer Consortium (VICC) working group (2017.05.09)
  • Chunlei presented a poster on the BioThings SDK at ISMB/BOSC (2017.07.21 – 2017.07.25)
  • Kevin presented a poster on the BioThings Explorer at ISMB/BOSC
  • And the BioThings Explorer manuscript titled, “Cross-linking BioThings APIs through JSON-LD to facilitate knowledge exploration”, has been submitted

2017 has also been busy for the Gene Wiki / Wiki Data team here in the Su Lab as:

  • the Gene Wiki / Wiki Data Team shared their work at Biocuration 2017 conference (03.26.2017 – 03.29.2017). Both Sebastian & Tim gave talks, while Greg and Nuria had poster presentations.
  • Greg also presented at Heart BD2K site visit (2017.04.20)
  • Andrew presented at Bioinformatics Open Source Conference (2017.07.22 – 2017.07.23)
  • and Andrew was also featured on the Wikimedia Research Showcase Webcast (2017.08.23)

In case you missed it, the renewal grant application for this project was also shared in a blog post this year which gives the big picture idea of the project moving forward.

While we’re on the subject of crowdsourced science, Mark2Cure had a busy year as well.

  • Mark2Cure hosteds #Mark4Rare event for Rare Disease Day (during the week prior to rare disease day)
  • Mark2Cure organized/hosted the Citizen Science Expo at La Jolla Library (2017.03.11)
  • Ginger presented about Mark2Cure at Heart BD2K site visit (2017.04.20)
  • Mark2Cure was featured on SciStarter (2017.04.27)
  • Mark2Cure had a joint webinar with Cochrane Crowd (2017.05.08) followed by a joint event (#MedLitBlitz) which included a Mark2Curathon (2017.05.11)
  • Ginger presented Mark2Cure at the 2017 Citizen Science Association Conference during the citizen science for biomedical research panel (2015.05.18)
  • Ginger presented a poster on Mark2Cure at the CSA 2017 conference (2015.05.18)
  • Max delivered a Project Slam about Mark2Cure at the CSA 2017 conference (2015.05.18)
  • Mark2Cure joined the CitSciBio at table in St. Paul Science Museum for Citizen Science Festival (2015.05.20)
  • Mark2Cure paneled for #CitSciChat (2017.07.19)
  • Mark2Cure joined #Dazzle4rare 2017.08.13 – 2017.08.19

In addition to these projects, members of the Su Lab have been busy advancing research in protein folding/microscopy analysis methods, osteoarthritis, and data-driven drug re-purposing methodologies–all while managing to have an epic time.

May the FAIR-principles-of-open-data be with you

BioGPS Spotlight on the Sheep Gene Expression Atlas

Posted by on Nov 6, 2017 in BioGPS, data release, spotlight | 0 comments

In BioGPS, there are a number of Interminebased model organism plugins (and to a lesser extent model organism data sets) which allow users to explore gene expression in organisms typically studied in biomedical research. Model organisms such as mice, rats, flies, worms, zebra fish, etc. have well-annotated genomes and a lot of well-established tools for further exploring and contributing to the knowledgebase around those animals. In contrast, valuable agricultural animals do not have this degree of data, tools, and resource development. This may change as the biomedical and agricultural research domains blur thanks to the movement of medication and infectious disease from farm animals into humans. In this spotlight, we’re happy to introduce a new data set that’s been added to BioGPS–the Sheep Gene Expression Atlas. Emily Clark, a researcher and Chancellor’s Fellow from The Roslin Institute, University of Edinburgh, kindly answered our questions.

  1. In one tweet or less, introduce us to the Sheep Gene Expression Atlas:
    A high resolution atlas of gene expression across tissues and cell types in sheep.

  3. Who is your target audience? How big is the community studying sheep genetics?
    Our target audience is the livestock research community, particularly those working on small ruminants. There is a large research community studying sheep genetics with research groups across the globe and an International Sheep Genomics Consortium (ISGC). The project is also a valuable resource for the Functional Annotation of Animal Genomes Consortium (FAANG) and represents the largest RNA-Seq FAANG dataset to date. Sheep are also an important non-human model and we hope the data will be useful for the mammalian genomics community more generally.

  5. It looks like the academic article on the Sheep Gene Expression Atlas was published a little more than a month ago in PLOS Genetics. How long has the team been working on the atlas before reaching this point?
    The sheep gene expression atlas was initiated in 2013, so we have been working on it for approximately 4 years. The first year involved tissue collection then the following years, library preparation and data analysis.

  7. In your paper, you illustrate the value of the Sheep Gene Expression Atlas by looking at Innate Immunity genes and the advantages of crossbreeding. What other types of research could this atlas contribute to? Antibody development for immunological assays? Prion disease research? Antibiotic use in animal husbandry?
    We hope that the atlas will now be used by researchers working in livestock genetics and genomics to link genotype to phenotype. It has potential uses in identifying targets for novel therapeutics, some of the dataset from the sheep expression atlas project has been used to identify genes relevant to resistance to mastitis, for example (Banos et al. 2017 The Genomic Architecture of Mastitis Resistance, BMC Genomics). Researchers at the Roslin Institute, interested in prion disease, are also looking at the expression of the gene PRNP (prion protein) across tissues using the sheep atlas dataset. The scale and scope of the dataset is such that it should contribute and provide information for multiple research projects and different fields in sheep but also other ruminants and livestock.

  9. Who is the team behind the Sheep Gene Expression Atlas?
    The sheep atlas project was led by David Hume and Alan Archibald who initiated the work. It was coordinated by Emily Clark, with bioinformatic support from Stephen Bush. The project involved a large team of people for sample collection at The Roslin Institute including farm technicians who also managed the animals for the project. We are also very grateful to Chunlei Wu and Cyrus Afrasiabi for their help making the sheep atlas dataset visualisable on the BioGPS platform.

  11. What is in store for the Sheep Gene Expression Atlas?
    Next we hope to use the data set for a global analysis of allele specific expression across tissues and cell types in sheep and we also have a comparative analysis of gene expression from a smaller subset of tissues in goat which we hope to release soon.

Thanks to Emily Clark, and the rest of the the Sheep Gene Expression Atlas team, for sharing their high resolution Sheep Gene Expression Atlas with BioGPS. If you use the Sheep Gene Expression Atlas data set in your research, be sure to cite their publication:

Clark EL, Bush SJ, McCulloch MEB, Farquhar IL, Young R, Lefevre L, et al. (2017) A high resolution atlas of gene expression in the domestic sheep (Ovis aries). PLoS Genet13(9): e1006997.

To search for your favorite genes in the Sheep Gene Expression Atlas, visit the sheep-specific portal at:

Prion protein expression in the Sheep Gene Expression Atlas in BioGPS

Happy Birthday Wikidata!

Posted by on Oct 26, 2017 in Wikidata | 0 comments

On Wikidata’s fifth birthday, we (the Gene Wiki team) offer our hearty congratulations!! It is amazing what has been achieved in such a short timespan. Wikidata has basically given us – and the larger research community – the gift of not having to maintain a core knowledge infrastructure. It has been taken care of (i.e. millions of SPARQL queries daily), so the research community can now focus on its core task, doing research.

Our project – the Gene Wiki project – started in 2008 with the objective to seed Wikipedia with high quality basic biomedical facts with the goal of crowdsourcing a gene-specific review article for every human gene. With the birth of Wikidata in 2012, we shortly after shifted our focus from Wikipedia to Wikidata. On Oct 6, 2014, we had our first milestone: all human genes had entities in Wikidata.

Since then, we have continued enriching Wikidata with not only gene annotations from other species, but also extended the coverage to related concepts such as diseases, drugs, chemical compounds and other related concepts. We have developed a python library (Wikidata Integrator), which started as a biomedical library but is now applied in other domain areas.

We view the current landscape of biomedical data in Wikidata as basically consisting of three layers. The first layer is those resources which our team has directly loaded. We have focused on resources that are the most commonly used by researchers to form a solid foundation of biomedical knowledge. The second layer is formed by partner organizations with whom we’ve collaborated to help bring their resources into Wikidata. These partners bring key new data types, including information on genetic variants (from CIViC) and on biological pathways (from Wikipathways and Reactome). And finally, we are perhaps most excited when we discover efforts that are completely independent in origin but highly synergistic in our mission. This group includes James Hare’s effort to load environmental exposures from the CDC, and the amazing Wikicite team for loading bibliographic data from the scientific literature.

The sum total of all this work is a richly interconnected network of open biomedical knowledge. And this network enables us to ask and answer an impressively diverse set of biomedical questions (a growing list is documented at

The genewiki landscape with its three layers.

Looking ahead and as a birthday present, we can lift a corner of the veil on our imminent developments.

To improve the robustness we are developing stronger feedback loops to experts curating primary sources. These feedback loops are based on validation reports such as the already existing constraint violations, but we are also looking into more complex constraint patterns where multiple statements are validated together using Shape Expressions. Currently, our bots are running on a continuous integration platform called Jenkins, we are working towards more automation of our efforts, such as driving the feedback loops and quality control.

We are excited to continue our work to make Wikidata the most comprehensive hub for open and linked biomedical data!

Upcoming #CitSciChat on Biomedical Citizen Science

Posted by on Jul 14, 2017 in #citscichat, biomedical research, citizen science, mark2cure, presentation | 0 comments

New Mark2Cure Video added to our youtube playlist!

The Citizen Science Conference in May was very productive, and the last of Mark2Cure's recorded talks is now available on our youtube channel. As previously mentioned, Max delivered the project slam for Mark2Cure and was selected as one of the top three to deliver an abbreviated version during the 'Night in the Clouds' event.

View the two-minute talk here:

Biomedical CitSciChat on Wed. July 19th, at 11:00am PT

Speaking of the conference, we were able to connect in person with a lot of lovely people in the citizen science arena, especially the amazing people from @EyesOnAlz, @CitSciBio, and @CochraneCrowd. Because we're all passionate about bringing citizen science to biomedical research, we organized a panel for a biomedical #citscichat. Caren Cooper (@CoopSciScoop) kindly agreed to moderate the chat as usual, and Pietro (@pmichelu, @EyezOnAlz) was able to convince @foldit's Seth Cooper to join the panel.

What: Hour long chat on biomedical citizen science (#CitSciChat)

Where: online via twitter

When: Wed July 19 2:00pm ET (11:00am PT)

Why: Because citizen science is used in biomedical research too

Who: Everyone interested in citizen science is welcome to join this chat which will be moderated by citizen science expert and author, Caren Cooper. The panel so far includes:

  • Mark2Cure of course! Mark2Cure is a citizen science project for addressing the big data issue of biomedical literature. Citizen scientists help look for clues about NGLY1-deficiency in curated literature. (@Mark2Cure/@gtsueng, @x0xmaximus, @AndrewSu)
  • Cochrane Crowd is a citizen science project from the Cochrane Collaborative, and also looks to make biomedical literature more useful. Citizen Scientists help identify randomized controlled trials so that Cochrane Reviewers can use them to answer important medical questions. (@Cochrane_Crowd, @annanoelstorr)
  • EyesOnAlz/Stall Catchers is a citizen science project from the Human Computation Institute to identify blood blockages in short videos of the brain. Their game is super fun, helps with Alzheimer's research AND they have a major event (Catchathon) coming up. If you would like to host a local catchathon, check out this post. (@EyesOnAlz, @seplute, @Clair_csg, @pmichelu)
  • CitSciBio is NIH's new biomedical citizen science hub. It is sponsored by the Division of Cancer Biology at the National Cancer Institute. There are tools for collaborating, creating projects, and now you can login via your scistarter account. (@citscibio)
  • is a long standing, and very successful citizen science game which empowers gamers and volunteers to help determine the structure of proteins important to biomedical research. Seth Cooper from Northeastern University has agreed to join the panel to share about this wildly successful project. (@UWGameScience)
  • Beat the heat and help science!

    Need an excuse to stay indoors, avoid chores, and avoid the summer heat? Look no further! One of our current missions is over 80% complete. Help us finish it!

    Integrating Wikidata and other linked data sources – Federated SPARQL queries

    Posted by on Jul 13, 2017 in semantic web, SPARQL, Wikidata | 0 comments

    This blog is about running federated SPARQL queries on Wikidata.  A federated query is a special type of SPARQL query that runs on more then one SPARQL endpoint. It allows access to multiple linked data resources in a single query. Below is a template of a federated query.

    Structure of a federated query. It contains query patterns for both the local endpoint (green box) and a remote endpoint (blue box). The address of the remote SPARQL endpoint is expressed with the SERVICE keyword. (sourcecode)

    The WikiData Query Service (WDQS) now supports federated SPARQL queries on a limited number of endpoints. Remote SPARQL endpoints can be added to the list of supports endpoints by nomination.

    Wikidata aims to provide the sum of all knowledge to the world at large. To fulfil this,  it needs to be a hub to the total knowledge space. After all, different technical, legal or social limitations exist to include all in a single repository. Through federation or distributed querying, local and remote data can be combined.  Here we explore three ways to apply this type of querying on Wikidata content. 


    From Wikidata to an external SPARQL endpoint (Wikipathways)

    The following query applies federation to integrate between a pathway from Wikipathways and Wikidata. Wikidata contains items on human pathways from Wikipathways. Metabolic interactions are not yet captured in Wikidata. Through federation, these metabolic interaction can be obtained. In reverse direction is it possible to obtain properties of pathway elements from Wikidata. Take for example the “Sudden Infant Death Syndrome (SIDS) Susceptibility Pathways (Homo sapiens)” pathway. It contains various biological interactions. We could now, using federated queries, get properties such as the mass of a given pathway element. 

    Sudden Infant Death Syndrome (SIDS) Susceptibility Pathways (Homo sapiens)” pathway (

    Using this query as input, with a federated query it is possible to enrich this pathway with properties not captured in Wikipathways. One example would be the following query that takes interactions from the above pathway and combines that with the mass of the individual pathway parts.

    You can run this query here or watch it run on youtube

    From a remote SPARQL endpoint to Wikidata

    If a remote SPARQL endpoint is not (yet) eligible to be used in the WDQS, it is possible to run the query from the external endpoint. That is, if the external endpoint accepts submitting federated SPARQL queries. The SPARQL endpoint of UniProt is a nice example.  UniProt includes much more properties for proteins than currently captured in Wikidata. Properties that due to the more restrictive nature of the applicable license can’t be included in Wikidata. The following federated SPARQL endpoint runs on the SPARQL endpoint of UniProt. It selects all human UniProt entries with a sequence variants that leads to a loss of function, and also physically interacts with a drug used as an enzyme inhibitor.

    Integrating Wikidata content with data from UniProt using federated query submitted at

    Try it…


    From a local SPARQL endpoint to Wikidata (Data from a database that will never go into WD)

    Another interesting use case where federation can be quite handy, is in the context of local data. Epidemiological data on for example Zika outbreaks can contain large set of measurements spread over multiple time frames. Loading those measurements into Wikidata, can be be difficult especially if the outbreak is ongoing resulting in new data arriving in rapid intervals. One solution to enable integration with that data and other resources like Wikidata is running distributed queries from a local SPARQL endpoint. The local SPARQL endpoint has two roles, first it collects the measurements from the different Zika studies, secondly federated queries can be executed to enrich these measurements with knowledge from Wikidata.  We have create an example script that takes data on Zika outbreaks, converts that to linked data as RDF, which is then loaded in a local SPARQL endpoint. This prototype is available on github.

    This approach also works when one would like to integrate sensitive data (clinical patient data) with external WIkidata knowledge if the local endpoint is maintained from within a secure infrastructure which allows getting data from outside the infrastructure, but prevents exports.  


    Indeed SPARQL has a steep learning curve.

    Although writing SPARQL queries can be perceived as being quite intimidating, the feature to run federated queries on Wikidata content is very valuable and needed to make Wikidata the central hub of research data in the life sciences. The effort to learn SPARQL is worth it. Fortunately, Wikidata provides a large set of examples. Either for inspiration or as learning material.
    There are also quite some developments to make writing queries easier.

    There is also a R package that integrates SPARQL with R scripts, where the example queries from Wikidata can be scraped, which means that one could use the advantages SPARQL offers without writing a SPARQL query, simply by building on what others already made.

    Finally, there is always the help of the Twitter space. Many are quite eager to share SPARQL knowledge.

    The Gene Wiki project: Looking to the future v.2017

    Posted by on Jul 7, 2017 in Gene Wiki, GeneWikiRenewal, proposal, Wikidata | 3 comments

    The Gene Wiki project has been generously funded by the National Institutes for General Medical Sciences (NIGMS) since 2009. As the second funding period is wrapping up early next year, it was time to once again look forward and think about our vision for the next 4-5 years. Posted below is what we came up with, submitted earlier this week to the NIH as a competing renewal proposal with Lynn Schriml and Kristian Andersen as co-Investigators. Fingers crossed!

    Also a fine time to recognize that this proposal resulted from the direct and indirect contributions of so many people — postdocs, grad students, staff, past and current collaborators, Wikidata and Wikipedia communities, etc etc — far too many to name individually here. For a mostly comprehensive list, please see our grant-related publications.

    Happy Fathers Day!

    Posted by on Jun 18, 2017 in biocuration, citizen science, conference, mark2cure, presentation | 0 comments

    A HUGE thanks to all the dads (and EVERYONE) who has been contributing to make a difference for the NGLY1 families.

    Shipping delays Apologies to international prize and drawing winners who were waiting for their prizes. Most of the international packages that we shipped out in May/June have been returned to us due to customs issues (fortunately, this happened at some point prior to shipping so the postage on these is still good, unfortunately, it took a long time for these to get back to us so we can address the issue). We’ll be trying again to get these out ASAP.

    Max’s original project slam now online As mentioned in our previous newsletter, Max delivered the project slam for Mark2Cure at the Citizen Science Conference in Minnesota. The project slam talks were supposed to have been recorded and still may be released by the Citizen Science Association someday, but we couldn’t wait. Here’s our recording of Max’s project slam. He finished within his allotted four minutes, and was engaging enough to win one of three invitations to deliver an even shorter version of the slam at an even the following day.
    You can check it out here:

    You be the scientist! One thing we’ve heard (and quite agree with) at the Citizen Science Conference is that trained volunteers are capable of doing more than simple tasks. Mark2Curators have very much fed into the tutorial process, and played an important role in testing and improving the design of the interface. The entities our users have identified from the text have already yielded interesting clues which we’ve used to expand the set of documents to investigate, and by now, there are users who have read a lot of abstracts—A LOT! If you’ve read something that sticks out in your mind as being potentially related to NGLY1-deficiency, share it with us! We’d love to hear YOUR hypothesis on what might be an interesting term to explore and why.

    Science Game Lab: tool for the unification of biomedical games with a purpose

    Posted by on Jun 16, 2017 in games, genegames, gwaps, Science Game Lab, SGL, sulab | 0 comments

    Scripps team: Benjamin M. Good, Ginger Tsueng, Andrew I Su
    Playmatics Team: Sarah Santini, Margaret Wallace, Nicholas Fortugno, John Szeder, Patrick Mooney, 
    With helpful ideas from: Jerome Waldispuhl, Melanie Stegman
    Games with a purpose and other kinds of citizen science initiatives demonstrate great potential for advancing biomedical science and improving STEM education.  Articles documenting the success of projects such as and Eyewire in high impact journals have raised wide interest in new applications of the distributed human intelligence that these systems have tapped into.  However, the path from a good idea to a successful citizen science game remains highly challenging.  Apart from the scientific difficulties of identifying suitable problems and appropriate human-powered solutions, the games still need to be created, need to be fun, and need to reach a large audience that remain engaged for the long-term.  Here, we describe Science Game Lab (SGL) (, a platform for bootstrapping the production, facilitating the publication, and boosting both the fun and the value of the user experience for scientific games with a purpose.  
    Ever since the project famously demonstrated that teams of human game players could often outperform supercomputers at the challenging problem of 3d protein structure prediction, so-called ‘games with a purpose’ have seen increasing attention from the biomedical research community.  A few other games in this genre include: Phylo for multiple sequence alignment, EteRNA for RNA structure design, Eyewire for mapping neural connectivity, The Cure for breast cancer prognosis prediction, Dizeez for gene annotation, and MalariaSpot for image analysis.  Apart from tapping into human intelligence at scale, these efforts have also produced valuable educational opportunities.  Many of these games are now used to introduce their underlying concepts in classroom settings where games in all forms are increasingly working their way into curriculums.  Concomitant with the rise of these ‘serious games’, citizen science efforts such as the Zooniverse and Mark2Cure have sought similar aims but have packaged their work as volunteer tasks, analogous to unpaid crowdsourcing tasks, rather than as elements of games.  
    Many of these initiatives have succeeded in independently addressing challenging technical problems through human computation, improving science education, and generally raising scientific awareness.  However, with so much interest from the scientific community and a booming ecosystem of game developers, there are actually relatively few of these games in operation now.  Recognizing the opportunity, various groups have attempted to push the area forward through new funding opportunities and through various ‘game jams’ such as the one that produced the game ‘genes in space’ for use in analyzing microarray data in cancer.  Here, we take a different approach towards expanding the ecosystem of games with a scientific purpose.  Rather than attempting to seed the genesis of specific new game-changing games, we hope to lower the barrier to entry for new games and related citizen science tasks to generally promote the development of the entire field.  With this high-level aim in mind, we developed Science Game Lab (SGL) to make it easier for developers to create successful scientific games or game-like learning and volunteer experiences.  Specifically, SGL is intended to address the challenges of recruiting players and volunteers, keeping them engaged for the long term, and reducing the development costs associated with creating a scientific gaming experience.
    The Science Game Lab Web application
    SGL is a unique, open-source portal supporting the integration of games and volunteer experiences meant to advance science and science education (  Unlike other related sites that act more like volunteer management and/or project directory services, such as SciStarter and Science Game Center, SGL is not simply a listing of related websites.  Rather, it is an attempt to create a user experience that takes place directly within the SGL context yet still incorporates content from third parties.  The system is largely inspired by game industry portals such as Kongregate that enable developers to incorporate their games directly into a unified metagame experience .
    Players can use the portal to find and play games with their achievements within the games tracked on site-wide high score lists and achievement boards (Figure 1).  Players can earn the SGL points that drive these leaderboards for actions taken in different games.  In this way, SGL provides developers with access to a metagame that can be used to encourage players in addition to the incentives offered within individual games (Figure 2).  This metagame can also be used by the system administrators to help direct the player community’s attention to particular games or particular tasks within games.  For example, actions taken on new games might earn more points than actions taken on more established games as a way to ‘spread the wealth’ generated by successful games.    

    Figure 1.  SGL home page demonstrating site-wide high score list, game listing, and links to achievements, help, and user profile information.
    Figure 2.  Badges displayed on user’s profile page.  Available badges not yet achieved are greyed out.
     Developers interact with SGL by incorporating a small javascript library into their application and using the SGL ‘developer dashboard’ to pair up events in their game with points, badges and quests managed by the SGL server.  At this time, SGL only supports games that operate online as Web applications.  The games are hosted by the developers and rendered in the SGL context within an iframe.  The SGL iframe provides a ‘heads up display’ that provides real time feedback to game players with respect to events sent back to the SGL server such as earning points, gathering badges, or progressing through the stages of a quest (Figure 3).  This display provides developers with the ability to add game mechanics to sites that are not overtly games.  For example, Wikipathways incorporated a pathway editing tutorial into SGL, using the heads up display to reward users with SGL points and badges for completing various stages of the tutorial.   The tutorial also took advantage of the SGL quest-building tool (Figure 4).  Games are submitted by developers for approval by SGL administrators.  Once approved, the games appear in the public view and can be accessed by any player.  

    Figure 3.  The heads up display provided by the SGL iframe.  Shows events captured by the API and provides users with immediate feedback.   

    Figure 4.  Tasks in SGL can be grouped into quests.  The figure shows a particular user’s progress through various quests available within the system.

    If a critical initial mass of effective games can be integrated, SGL could strongly benefit new developers by providing immediate access to a large player population.  Site-level status, identity and community features can help with the even greater challenge of long-term player engagement, a noted problem in the field.  Within the context of science-related gaming, such status icons might eventually be used as practically useful, real-world marks of achievement inline with the notion of ‘Open Badges.  As demonstrated by the Wikipathways tutorial application, SGL can be used to replace the need for developers to host their own login systems, user tracking databases, and reward systems – all of which can be accomplished using the SGL developer tools. Citizen scientists are not homogenous in their motivations. Designing to be inclusive of gamers and non-gamers can be challenging. By offering an alternative means of experiencing a web-based citizen science application, SGL allows developers to cater to both their gaming and non-gaming contributor audience. Together, these features unite to raise the overall potential for growth within the world of citizen science and scientific gaming.  
    Future directions
    SGL is currently functional, but so far has attracted only a small number of developers willing to integrate their content into the portal.  Future work would need to address the challenge of raising the perceived value of integration with the site while lowering the perceived difficulty.  Looking forward, key challenges for the future of SGL include better support for:
    • games meant for mobile devices
    • development of quests that span multiple games
    • teachers to build SGL-focused lesson plans and track student progress
    • creating new ‘SGL-native’ games
    • integration with external authentication systems
    None of these are insurmountable challenges, but they all require significant continued investment in software development.  As an open source project, we encourage contributions from anyone that shares in our vision of spreading and doing science through the grand unifying principle of fun.

    Building communities of knowledge with Wikidata

    Posted by on Jun 16, 2017 in crowdsourcing, Gene Wiki, semantic wikipedia, sulab, wiki, Wikidata, wikipedia | 0 comments

    As the Wikimedia Movement works to define its strategy for the next fifteen years, it is worthwhile to consider how its recent product Wikidata may fit into that strategy.  As its homepage states,
    Wikidata is a free and open knowledge base that can be read and edited by both humans and machines.”
    Wikidata is a particular kind of database designed to capture statements about items in the world with references that support those statements.  Because Wikidata is a database, its contents are meant to be viewed in the context of software that retrieve the data through queries and then renders the data to meet the needs of a user in a certain context.  The same data can thus be viewed on Wikidata-specific pages such as and in the infoboxes of Wikipedia articles such as  Importantly, Wikidata content can also be used in applications outside of the Wikimedia family such as   
    Examples of Wikidata use now include:
    The molecular biology community (and in particular the Gene Wiki group) has embraced Wikidata as a global platform for knowledge integration and distribution.  To help envision how Wikidata may fit into the strategic vision of the WMF movement, it is worth taking a look at how and why this particular community is using Wikidata.  
    History of the Gene Wiki initiative
    The sequencing of the human genome at the beginning of this century and the consequent rush of data and new technology for producing even more data fundamentally changed how research in biology is conducted.  Before the year 2000, research typically proceeded with a single gene focus.  A typical PhD thesis would entail the analysis of the genetics or function of one gene or protein at a time.  A few years after the first genome however, it became possible to measure the activity of ten’s of thousands of genes at once resulting in an omnipresent problem of generating interpretations of experimental results containing hundreds of genes.  While a scientist may come to grasp the literature surrounding a single gene quite well, it is not possible to know everything there is to know about all 20,000+ genes in the genome – particularly when this knowledge is expanding on a minute by minute basis.  As a consequence, there arose a need to produce summaries of what was known about each gene so that researchers could quickly grasp its nature and easily find links to more detailed references as needed.  By 2008, many different research groups published wikis attempting to allow the scientific community to generate the required articles, e.g. WikiProteins, WikiGenes, and the Gene Wiki.  The Gene Wiki project was unique among this group as it anchored itself directly to Wikipedia and, likely as a result of that decision, has enjoyed long term success.  This initiative works within the English Wikipedia community to encourage and support the collection of articles about human genes.  Its main contributions are the infobox seen on the right hand side of of these articles and software for generating new article stubs using that template.  
    Wikidata and the Gene Wiki project
    For the past several years, the Gene Wiki core team (funded by an NIH grant) has focused primarily on seeding Wikidata with biomedical knowledge.  In comparison to managing data via direct inclusion and parsing of infobox templates as before, this makes the data much easier to maintain automatically and, importantly, opens it up for use by other applications.  As a result, Wikipedia isn’t the only application that can use this structured information.   One of the first products of that process was a new module (Infobox_gene) that draws all the needed data to render the gene infobox dynamically from Wikidata, greatly reducing the technical challenge of keeping the data presented there in sync with primary sources.  
    In addition to the relatively simple collection of gene identifiers and links off to key public databases that are presented in the infoboxes, Wikidata now has an extensive and growing network of knowledge linking genes to proteins, proteins to drugs, drugs to diseases, diseases to pathogens, pathogens to places, places to events, events to people, and so on and so on.  This unique, open, referenced, knowledge graph may eventually become the closest thing to ‘the sum of all human knowledge’.  Capturing knowledge in this structured form makes it possible to use it in all kinds of applications, each with their own community-specific user experiences.  As a case in point, the Gene Wiki group created Wikigenomes based primarily on data loaded into Wikidata.  This was followed quickly by Chlambase, an application specifically focused on distributing and collecting knowledge about different Chlamydia genomes.  These applications provide domain-specific user interface components such as genome browsers that are needed to present the relevant information effectively and thereby attract the attention of specialist users.  These users, in turn, have the opportunity to contribute their knowledge back to the broader community through contributions to Wikidata that can be mediated by the same software.  
    Wikidata and the world
    The molecular biology research community, as represented by the Gene Wiki project, are early adopters of Wikidata as a community platform for the collaborative curation and distribution of structured knowledge, but they are not alone.  The same fundamental patterns are already being applied by other communities, e.g. those interested in digital preservation and open bibliography.  In each case, we see communities working to transition from the current dominant paradigm of private knowledge management towards the knowledge commons approach made possible by wikidata.  This is not unlike the transition from the world of the Encyclopedia Britannica to the world of Wikipedia.  The only important difference is that the knowledge in question is structured in a way that makes it easier to reuse in different ways and in different applications.  

    Wikidata provides a mechanism for massively increasing the global good generated by the Wikimedia Foundation’s work by capturing knowledge in a form that can be agilely used to empower all manner of software with the sum of human knowledge.  

    Happy Memorial Day weekend!

    Posted by on May 26, 2017 in citizen science, CitSci2017, Cochrane Crowd, conference, mark2cure, MedLitBlitz, poster, presentation | 0 comments

    The last few weeks have been a bit hectic, so we've got plenty of news and info to share with you.

    First of all, if you haven't seen it yet, Cochrane Crowd has posted about about our joint webinar and the #MedLitBlitz. If you missed the webinar or had technical difficulties/time zone issues with it, it's available on youtube. The prize packages for the top three participants of #MedLitBlitz are packed and will be shipped either today or early next week (depending on whether or not shipments have been picked up for today or not).

    Secondly, Mark2Cure was at the Citizen Science Association conference from 2017.05.15-2017.05.20, and was fortunate enough to share about YOUR work to an audience of scientists who LOVE citizen science! More than a few researchers stopped to introduce themselves to me and spoke highly of our community! Although it's always weird to hear a recording of your own voice, I recorded my presentation because it wouldn't be fair to talk about the amazing work you've done without sharing it with you! You can find my presentation for the biomedical session in our youtube channel. On a side note, I know the audio quality isn't the best which is why I've transcribed it using youtube's captioning software. If you have trouble hearing the presentation (because of the poor audio quality), please turn on the closed captions.

    Max also delivered two lightning talks for the event, which I hope to upload soon.
    Not available yet, but will be soon In addition to the talks, we had a poster for Mark2Cure and a table at two public events.
    Max spreading the love for Mark2Cure

    We were especially pleased to be so close to our buddy at Cochrane Crowd for this event
    Cochrane Crowd looking good

    Lastly, it looks like one of the missions was completed just as I was settling back in after the conference. A HUGE thanks to everyone that helped complete the carpingly mission. A new mission has been launched in its place, so check it out if you have some free time.