In the last two months we’ve held small contests to improve the number of BioGPS plugins and their descriptions because plugins are crucial for customizing gene reports to suit user needs. In an ongoing effort to introduce resources which will potentially enhance the utility of BioGPS to our users, we will continue to feature plugins in our BioGPS Spotlight blog series. To kick off a string of Spotlight posts on InterMine-based resources, we start by featuring InterMine itself. InterMine is a powerful data warehouse system from the University of Cambridge, UK and has been used by to generate many valuable model animal-specific data mines. Although not a resource that can be used directly as a plugin in BioGPS, InterMine’s well-documented code (available on github) has enabled many researchers to share their data as web-based resources. Since several of these resources are BioGPS plugins, and because we love how InterMine’s open source activities contribute to open data efforts, we are pleased to shine a spotlight on InterMine. Yo Yehudi (developer), and Rachel Lyne (biologist) were kind enough to answer our question for this post.
- InterMine’s blog which dates back to 2005, and suggests that it was started for the purpose of sharing fruitfly data (as FlyMine). Please tell us more about the early development of FlyMine and the transition from a data resource to a resource for developing data resources (InterMine).
Yes, FlyMine is the prototypical InterMine database and was developed specifically to integrate fruitfly and mosquito data. However, this was soon followed by modMine, which provides the gateway to the huge amount of data collected by the modENCODE project. InterMine’s extensible core data model and flexible query options were key to it being chosen for this project. This was a high profile project and a huge collaborative effort, but showed that InterMine could scale in both depth and breadth of data. The big transition then came with a collaboration between InterMine and several of the Model Organism Databases in 2009. With so many groups using the InterMine software the focus became very much the development of a high quality, re-usable resource, into which external groups could also participate.
- In addition to FlyMine, InterMine has helped with the creation and/or maintenance with many other data sources (HumanMine, RatMine, etc.). How many of the InterMine-based resources does the InterMine team maintain, and how many are maintained by collaborators?
Most mines are maintained by third parties. We currently list 28 different mines on our homepage, but we only run two of them ourselves. The rest are run by our fabulous community members. We keep in active contact with most of them via our mailing lists and GitHub repositories.
- Which mines are maintained by InterMine?
InterMine maintains HumanMine and FlyMine.
- Which mines are maintained by collaborators?
Pretty much everything else! Some organizations run multiple different mines – LegumeFederation has four, for example with Bean, Soy, Legume, Peanut, whilst other organizations have one gigantic mine – PhytoMine has over 60 different plant genomes.
- What sorts of data collections are suitable for use with InterMine and what kind of support do you offer researchers who wish to use InterMine?
InterMine’s easily extendable data model means pretty much any kind of biological data can be added to the core set of sequence-based data. We tend to focus on large sets from other public repositories, such as all the protein interactions from intAct and the large curated data sets from the Model Organism Databases. We have active help desks for the various InterMines and can help researchers build the searches they need – often we can provide them with their search as a simplified form (which we call a template).
- What is InterMine’s greatest success story so far?
Being adopted as the advanced search interface for the major Model Organism Database groups was a huge boost for the InterMine team and resulted in many constructive discussions and advancements.
- What improvements are coming in the future?
We’re quite excited about a couple of things. While it’s still in a really early stage, we’ve been investigating different data tools such as graph databases and semantic web technologies. We’ve also spent a lot of time developing and testing InterMine 2.0, codename “BlueGenes” which should hopefully soothe some of the usability pain points in the current version, and also bring an up-to-date look and feel to InterMine.
- Who is the team behind InterMine?
Our office houses seven developers and a biologist. (That sounds a bit like the start of a joke, I know!) The team’s skillset is pretty well-rounded, with some of us more focused on presentational technologies – the bits of InterMine you see and interact with on the screen – while others tend to focus on wrangling the data. Rachel, our biologist, acts as a scientific advisor and helps to keep the devs on track.
- InterMine is open source and has detailed documentation to support it. Can you comment on the decision to be open source?
The greatest thing about open source is the community. They freely submit bug reports, feature requests, and general ideas. Developers from different organizations contribute their code, and often they’ll even answer support queries. Enthusiastic users volunteer for usability testing. Given the levels of engagement, I think it’s reasonable to suggest that the choice of an open license is actually a vital part of our continued success.
- InterMine is used in a lot of valuable open data repositories. Does InterMine steer or incentivize its users towards open data?
Not specifically. As the InterMine databases tend to be providing a resource primarily for academic users, and tend to be developed by academic groups, they have naturally primarily made use of open data sources.
2016.06.07 edit – To learn more about InterMine, be sure to check out their paper:
Kalderimis A, Lyne R, Butano D, Contrino S, Lyne M, Heimbach J, Hu F, Smith R, Stěpán R, Sullivan J, Micklem G. InterMine: extensive web services for modern biology. Nucleic Acids Res. 2014 Jul;42(Web Server issue):W468-72. doi: 10.1093/nar/gku301. Epub 2014 Apr 21. PubMed PMID: 24753429; PubMed Central PMCID: PMC4086141.