After four productive years of funding from NIGMS, we have a bit over a year left on our current grant for BioGPS. In addition to releasing some new features for data set visualization and management (our current Specific Aim #1), we are planning to spend significant time developing the aims and preliminary data for our renewal application.

So what are we planning next for BioGPS? We want to build on the strengths, so let’s first summarize what we’ve accomplished. We think BioGPS has three outstanding features, the combination of which makes it completely unique in the gene annotation space:

  • A dedicated user community: Whether it’s because of the data we provide to users or the flexibility provided by our user-customizable gene report layout, BioGPS gets a fair bit of web traffic. We get around two million pageviews per year from thousands of registered and anonymous users.
  • Mechanisms to harness and share users’ contributions: Most obviously, the BioGPS plugin library is the result of explicit contributions from over a hundred BioGPS users who registered at least one plugin. BioGPS also uses implicit contributions from users in the form of plugin usage statistics, which then allows us to use popularity as a proxy for utility.
  • The largest collection of gene-centric deep links available: Unlike a simple resource index, each BioGPS plugin is defined by a “URL Template” that gives us the exact URL for each gene-specific web page. That means for every gene indexed in BioGPS, we have the address of several hundred web pages that talk specifically about that gene. Powerful.

With these resources as our foundation, we envision many possible directions to explore in our renewal proposal. We’ve tried to organize them into these Specific Aims:

Aim #1: Continued growth of the user community. With community-based initiatives like BioGPS, you’re never done growing your community. We will continue adding features that focus on growing our user base, as well as each user’s network of connected entities (genes, plugins, layouts, and other users). While these don’t inherently have a biological use case at its core, the payoff in terms of BioGPS utility is invaluable.

Aim #2: Gene list sharing and management. We want to extend the gene-centric BioGPS model to gene lists. What will that mean? You currently have the ability to save gene lists for easy retrieval, and over 3500 lists have already been saved by our users. We will build in the same sharing and popularity features that we currently offer for plugins. We will also build “gene list plugins” (think heat maps and networks), as well as built-in enrichment analyses comparing to other community-contributed gene lists.

Aim #3: Structuring unstructured plugin data. While the first two aims are aimed at the nuts and bolts of building a community-based website, the last two aims focus on the scientific innovation. Right now, BioGPS receives and displays plugin content in completely unstructured form. Aside from knowing a web page is about a gene, BioGPS doesn’t really know what the plugin is saying about that gene. If we were able to provide structure to all of that unstructured content, it would open a whole new arena of integrative querying and data mining. We will propose to build a crowdsourcing solution to this challenge.

Aim #4: Structured data dissemination. Once we can effectively data mine within all the resources in the BioGPS plugin library, we will propose several applications that take advantage of those new capabilities. For example, we can develop a gene notification service to alert users when gene annotations are added or changed. We can mine plugins in near real-time for novel gene annotations (including, but not limited to, Gene Ontology). And we can also bring the wealth of knowledge in these gene-centric resources to the Linked Data community.

We’re still very early in the process of grant writing, so we certainly welcome feedback and comments!