This is an idea Archive


check for the Active list on the main gsoc page


Ideas List – 2012 (ie not now, in the past, for the record.. )

  • Idea 0. Create your own!
  • Idea 1. Gene Wiki+: Add write-back capabilities to meta-wiki
  • Idea 2. Gene Wiki+: User interface for identifying gene-disease links – (accepted project, see code!)
  • Idea 3. Gene Wiki+: Build human-powered triple validation service
  • Idea 4. Gene Wiki: extend gene wiki bot to add gene-disease infobox to Gene Wiki articles
  • Idea 5. BioGPS: A web interface for webpage annotation – (accepted project, see code!)
  • Idea 6. BioGPS: An enhanced HTML5 canvas-based data-chart viewer – (accepted project)
  • Idea 7. BioGPS:JQuery-based BioGPS gene-report layout canvas
  • Idea 8. BioGPS: MongoDB-based Gene-List storage and query API and an efficient enrichment-score computation engine – (accepted project)
  • Idea 9. Biological Games: BioESP framework
  • Idea 10. Über Wiki
  • Idea 11. Open Source Hardware: Game Kiosk
  • Idea 12. Collaborative knowledge engineering with Twitter
  • Idea 13. Gene Wiki: user encouragement bot
  • Idea 14. Mobile interface for Gene-centric information
  • Idea 15. BioGPS: cloud-based uptime monitor for plugins
  • Idea 16. Biological games: Asymmetric version of Dizeez – (accepted project, see code!)


Idea 0. Create your own!

Feel free to propose your own idea.  As long as it fits within the general space of ‘crowdsourcing biology’ we would be more than happy to consider it.  Please be detailed about what the specific goal would be, why it is important and how you plan to achieve it.

Idea 1. Gene Wiki+: Add write-back capabilities to meta-wiki

The Gene Wiki+ is a Semantic Media Wiki installation that integrates and presents public data about human gene function and variation.  The underlying data is gathered dynamically from resources such as SNPedia and the Gene Wiki (Wikipedia) and enhanced with semantic links using natural language processing.

At the moment, the Gene Wiki+ is a one way mirror of its underlying data sources (primarily SNPedia and Wikipedia).  In this project, the student will write the code necessary to transfer edits made on Gene Wiki+ back to the underlying source data sources.  A key challenge with this project will be to smoothly manage editor authentication to properly attribute edits that arrive in primary wikis (e.g. Wikipedia) to the editor that produced them within Gene Wiki+.

Expected results: By the end of the summer, we expect Gene Wiki+ edits to be transferred to the underlying wiki pages with correct attribution given to the responsible editor.

Knowledge prerequisite: It would be useful if the student had some experience with PHP and the MediaWiki API and it is vital that they have at least some experience with Python or Java.  It would also be beneficial if the student had experience with the Semantic Media Wiki framework though this can be learned during the course of the project.

Mentor(s): Ben, Andrew, Erik

Idea 2. Gene Wiki+: User interface for identifying gene-disease links

While the Gene Wiki+ framework provides a useful way to link data together and allows advanced users to perform powerful queries, it is somewhat lacking with respect to interfaces that take advantage of structured data.  In this project, the student will create a novel interface that highlights connections between genes and diseases accessible within the Gene Wiki+.

Expected results: By the end of the summer, we expect to have a functioning prototype of this interface up and running on the Gene Wiki+ website.

Knowledge prerequisite: Knowledge of some form of visual Web programming is vital. (Any or all of Javascript, HTML5, Flash, etc.)

Mentor(s): Ben, Erik, Salvatore, Andrew

Idea 3. Gene Wiki+: Build human-powered triple validation service

Much of the Gene Wiki+ system depends on automatically identified associations between concepts such as links between genes and diseases.  While generally valid, automatically mined associations may sometimes contain errors or need additional specification.  In this project, the student will create a service that manages a ‘triple verification queue’.  The service would encompass a database of possible facts (e.g. defects in gene X cause disease Y), simple interfaces for rating these statements (true, false, maybe), and algorithms for verifying and making use of these collected judgments.  This would be similar to the RABJ engine from Freebase:, but would be a general purpose tool that could be used by any similar project.

Expected results: By the end of the summer, we expect to have a functioning prototype of this service up and running on the Gene Wiki+ website.

Knowledge prerequisite: Knowledge of databases (e.g. MySQL) and Web programming (HTML, javascript) is vital.  Code should be developed in either Java or Python.

Mentor(s): Ben, Salvatore, Andrew

Idea 4. Gene Wiki: extend gene wiki bot to add gene-disease infobox to Gene Wiki articles

Articles in the Gene Wiki are enhanced with infoboxes that contain structured data drawn from primary databases like NCBI Gene.  This is accomplished with a Wikipedia ‘bot’ that is activated when new articles are created and when edits are made to the source databases.  In this project, the student will extend the bot to add a new table with links between genes and diseases.  These connections are of fundamental interest to both scientists and the general public yet are currently not well structured.  Note that Gene Wiki articles are viewed millions of times per year.  This project will have a major positive impact on the spread of knowledge through society regarding the genetic underpinnings of human disease.

Expected results: We expect the bot extension described above to be operational by the end of the summer.

Knowledge prerequisite: Python

Mentor(s): Erik, Salvatore, Ben

Idea 5. BioGPS: A web interface for webpage annotation

Currently, the biological resources (so called “plugins”) integrated in BioGPS are essentially rendered in iframes, which display gene-specific content directly from external websites. Typically, these websites are built by biologists with limited coding skills, so an exceedingly small number of these sites are marked up with semantic content. We would like the accepted student to develop a web interface to allow end users to mark up the web content and ultimately generate an annotated RDFa or microformat wrapper around unstructured HTML.  This wrapper could then be applied to all pages in a given site to reveal structured data for all genes.

Expected results: A web interface that will allow any user:

  • to display a live site (e.g. a page for gene A,
  • to select/highlight any block of content (“blocked” by any html tags)
  • to add annotation to selected content using either free-text or an ontology
  • to save users’ annotation as a set of rules (e.g., auto generated javascript code) to add semantic markups to the selected content.
  • to validate user specified “rules” in a page for other genes (e.g. for gene B,

Knowledge prerequisite: Knowledge of Javascript/HTML, and some knowledge of semantic web

Mentor(s): Chunlei, Ben, Andrew

Idea 6. BioGPS: An enhanced HTML5 canvas-based data-chart viewer

One of the most popular biological resources from BioGPS is the data-chart plugin (the link), where biologists can visualize the expression levels of a given gene in various experimental samples. The current version of data-chart viewer is HTML5 canvas-based with useful features like dynamic scaling, sample labelling/highlighting. We want to enhance it to provide more flexible customization features, e.g. supporting multi-factor grouping, aggregation and coloring on the experimental samples.

(This post provides more information about the underlying data structure.)

Expected results: A new version of HTML5 canvas-based data-chart viewer with the support of multi-factor grouping, aggregation, and coloring.

Knowledge prerequisite: Knowledge of HTML5/Javascript, and some biological knowledge is a plus.

Mentor(s): Chunlei, Andrew, Ian

Idea 7. BioGPS:JQuery-based BioGPS gene-report layout canvas

The current canvas for rendering a BioGPS gene-report layout is based on EXTJS. As a process of migrating from EXTJS to JQuery, we would like the accepted student to develop a JQuery-based layout canvas with the similar features in current version.

This is an example of the current gene-report page. The “layout canvas” refers to the content under “Gene Report” tab.

(This post provides more explanation, and this post provides the links to the existing EXTJS-based code)

Expected results: A JQuery based layout canvas system to replace the current version.

Knowledge prerequisite: Knowledge of Javascript, particularly JQuery, and HTML/CSS.

Mentor(s): Chunlei, Andrew

Idea 8. BioGPS: MongoDB-based Gene-List storage and query API and an efficient enrichment-score computation engine

BioGPS currently provide very basic functionality for the concept of a “gene list”, a list of related genes saved by users. We would like to develop a robust and scalable (based on MongoDB) backend to support large amount of public and user-contributed gene lists, and an efficient computational engine to calculate statistical enrichment across all saved gene lists, an operation to answer the question “find me all the similar gene lists to mine”.

Expected results: A REST interface for CRUD operation on genelists (based on Python/MongoDB). An efficient computation engine for enrichment score calculation.

Knowledge prerequisite: Knowledge of Python/MongoDB.

Mentor(s): Chunlei, Andrew

Idea 9. Biological Games: BioESP framework

We would like to build an open source implementation of the ESP game that could be adapted to suit gene annotation tasks.  The basic mechanics of the ESP game are as follows: 1) two players are paired at random, 2) they are both shown the same stimulus (an image for example), 3) both players attempt to type words that they they think the other player is typing to describe the stimulus (this is the ‘ESP’ part), 4) when the words match, both players are awarded points and they move on to the next stimulus.

While, to our knowledge, this game pattern has only been successfully applied to image annotation, we think that it might be successful in a number of different contexts.  We are particularly interested in gene annotation applications (e.g. link genes to related genes), but many more could be imagined.  This project has the potential to expand into a popular open source initiative.

Expected results: By the end of the summer we expect to have a working prototype that would: pair live players, pair single players with a robot player when no one else was playing, display images or other stimuli (e.g. gene/disease names) for game events, manage player login, track high scores, store collected information, and generally be fun to play.

Knowledge prerequisite: Web programming (HTML5/Javascript, WebSockets) + databases.

Mentor(s): Ben, Salvatore, Andrew

Idea 10. Über Wiki

The Gene Wiki+ uses the Semantic Media Wiki framework and programs that work with the MediaWiki API to provide a unified view over the Gene Wiki (portion of Wikipedia), and SNPedia. This integrated view is facilitated using the tools and ontologies from the National Center for Biomedical Ontology. Yet there are many other wikis out there with relevant content! See for example the PDBwiki – wiki for protein structure information, WikiPathways – a wiki for biological pathway information (and longtime GoSC mentor), or any of the other BioWikis (

In this project, the student will build upon the Gene Wiki+ infrastructure to build one wiki to rule them all (at least those relevant to biology).

Expected results: By the end of the summer we expect to see an extension to the MediaWiki-sync framework underlying the Gene Wiki+ that will incorporate data from at least two additional bio-wikis.

Knowledge prerequisite: Java

Mentor(s): Erik, Ben, Salvatore, Andrew

Idea 11. Open Source Hardware: Game Kiosk

In this project (which may or may not be compatible with GoSC interests), the student will supply the design documents that describe the construction of a light, inexpensive, portable computer kiosk for playing games at scientific conferences. We have found conferences to be a great venue for collecting feedback on scientific games but it is often difficult to produce a great user experience with the materials typically on hand. The scientific gaming kiosk should have the following properties: it should allow the user to stand while playing it should support either:

  • a flat screen monitor attached to a laptop
  • * AND/OR an eye-level iPad or other touchscreen tablet-type device it needs to be lightweight and decomposable such that it could be transported in a suitcase and assembled on location

See here for an example of a great solution that simply costs too much:

Expected results: We would like to see design documents and pictures/video of a working prototype. (If needed, we will supply some funding for materials, but the whole thing needs to cost less than $100.)

Knowledge prerequisite: None except creativity

Mentor(s): Ben, Salvatore

Idea 12. Collaborative knowledge engineering with Twitter

In this project, the student will implement the HyperTwitter concept (see as an open source project focused on assembling biological knowledge bases. The basic idea is that twitter users can author triples: subject-predicate-object statements about the world that can then be integrated into coherent, queriable knowledge bases.

As one example use case, it is now common for some percentage of attendees at scientific conferences to tweet key points of presentations while they are in progress. Using a HyperTwitter approach, these tweets could be assembled in real time into a database that captured the information being distributed at the conference in a manner that was globally accessible and non-transient. Imagine the following stream of tweets from twitter users @asu and @jsmith as they listen to a presentation about a protein network

@asu tweets: #cdk1 interacts_with #LATS1 : #cellbioconf2012 @jsmith tweets: #cdk1 is_a #gene : #cellbioconf2012 @asu tweets: #LATS1 interacts_with #Zyxin : #cellbioconf2012

The HyperTwitter implementation could catch these tweets, translate them into RDF, and another service could then display the emerging graph as it forms. During and after the conference, interested parties could make use of this knowledge, extend it, refine it etc. in ways that would otherwise be impossible.

Expected results: By the end of the summer, the student should deliver an application that would monitor a hashtag-based twitter stream for HyperTwitter content and transfer the encoded triples into an RDF knowledge base (a triple store).

Knowledge prerequisite: Some knowledge of semantic web tools such as the Jena Java framework for RDF is desirable.

Mentor(s): Ben

Idea 13. Gene Wiki: user encouragement bot

One of the best ways to reward people for contributing to a wiki is for them to see that other people notice and react to the edits that they make. If you edit an article and then some one else quickly comes along and fixes one of your typos, adds a sentence to one of your paragraphs or comments on your contribution the act of editing becomes much more fun and much less lonely. To encourage editor contribution and retention, the student will design and implement a social encouragement bot. The duties of the bot will be to:

  1. notice when a new editor makes an edit,
  2. notice when a non-active editor makes an edit,
  3. identify editors that are likely to be interested in the article that was edited,
  4. alert these related editors that a new editor has made a contribution and provide them with a very simple way to say ‘welcome’.

Expected results: The bot should be able to perform the 4 tasks listed above.

Knowledge prerequisite: Java

Mentor(s): Ben

Idea 14. Mobile interface for Gene-centric information

Biologists are swamped with information about genes spread across many the thousands of research articles indexed within PubMed. This project will be to build a service that helps them track recent advances in a user-centric, gene-centric manner on their mobile device.

Expected results: A working iPhone or Android application that sends push alerts to the user whenever an article in PubMed is published that mentions any of a user-defined set of genes.

Knowledge prerequisite: mobile application development

Mentor(s): Chunlei, Ian, Andrew

Idea 15. BioGPS: cloud-based uptime monitor for plugins

The biology research community has registered ~250 online, third-party websites as “plugins” in BioGPS. To encourage additional contributions by the community, and to reward those developers who have already registered plugins, we plan to build an integrated uptime monitor and downtime notification system in BioGPS. To reduce dependence on the connectivity to any single server, this system will be built on a geographically-distributed cloud-computing resources. This project will involve the construction of both the back-end infrastructure and a simple front-end interface.

Expected results: A stand-alone uptime monitoring system with web-service hooks for integration with BioGPS.

Knowledge prerequisite: Python (preferred) or Java, cloud computing a plus

Mentor(s): Chunlei, Andrew

Idea 16. Biological games: Asymmetric version of Dizeez

We have created a single-player game called Dizeez ( that aims to extract links between human genes and diseases. In this game, two players are randomly paired and shown a disease. Each player names genes related to that disease, and when a guess matches, both players get points and are shown another disease. Dizeez is a “symmetric” game, in that both players do the same thing.

We would like to create an “asymmetric” version of Dizeez. In this mode, Player 1 is shown a disease and names genes related to that disease. Player 2 is shown the genes listed by Player 1 and tries to guess the disease. When Player 2 guesses correctly, both players get points, and roles are reversed for a new disease.

Expected results: A web-based game for players to learn about (and contribute knowledge about) gene-disease links.

Knowledge prerequisite: Python, Javascript

Mentor(s): Salvatore, Ben, Andrew