BioGPS Spotlight on HumanMine

In our last spotlight we covered FlyMine, one of two mines maintained by the InterMine group at Cambridge University. This week’s spotlight is on HumanMine, the second of the two aforementioned mines. Yo Yehudi (developer), and Rachel Lyne (biologist) were kind enough to answer our questions for this post.

  1. In one tweet or less, introduce us to HumanMine:
    HumanMine integrates genomic data from 15+ curated sources across the H. sapiens genome.
     

  2. Who is your target audience?
    HumanMine is basically FlyMine’s younger sibling, so its target audience is pretty similar to FlyMine, just with a focus on human data. The webapp makes it easy for most biologists to analyse HumanMine data, whilst the API and client libraries allow programmatic access for the more computer-inclined.
  3.  

  4. HumanMine seems relatively new (doesn’t even have its own blog posts yet, many links point to FlyMine pages). How old is HumanMine? Why did you create HumanMine and why make the leap from FlyMine to HumanMine?
    HumanMine is, as you guessed, the newer of the two mines we manage at InterMine. The first HumanMine release was in May 2014 although it was preceded by MetabolicMine. FlyMine had been around so long that all the various documentation was already well-established at this point, so there didn’t seem to be much point in duplicating things. These days we’ve moved to a single blog for all things InterMine-related, covering human, fly, and anything else that comes up.

    HumanMine was created to complement the other model organism based InterMines, which are part of a consortium of researchers from the Model Organism Database groups, to allow model organism data to be related to human data and vice versa.
  5.  

  6. Why is HumanMine unique and special?
    HumanMine is different from other human genomic databases perhaps because of the number of separate sources it draws from. I recently had a chat with someone who said that their favourite gene didn’t have any homologs in a certain organism – they’d checked by searching their favorite datasource. But we looked it up in HumanMine and found out that actually it did have homologs after all, based on info from a different dataset. A datasource that combines multiple other datasets can save a lot of time and effort.

    One of the great advantages of HumanMine is its connections with the other InterMines for model organism data. The HumanMine database was built to complement these databases and enables researchers to relate model organism data to human data, through orthology, and likewise makes model organism data more accessible to medical researchers.
  7.  

  8. What level of traffic does HumanMine typically see? How does this compare with FlyMine?
    HumanMine’s traffic around 3400 pageviews per month over the last year, around a third of the size FlyMine’s traffic. This may be because FlyMine is a more established resource, but it’s a bit hard to tell this sort of thing.
  9.  

  10. What is your greatest success story so far?
    Our greatest success story so far stems from our collaborations with many other groups worldwide, which has created a network of related InterMine databases. This means that anyone using HumanMine is able to seamlessly also access equivalent data in many model organisms – mouse, rat, zebrafish, fly, worm and yeast.
  11.  

  12. What improvements are coming in the future? What data sources does HumanMine hope to include?
    There are a few different ideas we have planned. There’s a great BioJS protein feature viewer we’d like to add to our protein reports page, and we’d also like to see JBrowse embedded in our reports too. TargetMine has an auxiliary toolkit that we’d like to bring in. Another reason Open Source is great – there’s always a gorgeous wealth of visualisations and other tools available to plug into InterMine.

    Another thing we’ve started thinking about recently is updating our search capacities to use a nice advanced package like ElasticSearch to return more tailored and meaningful search results. We also have an InterMine R client that’s nearly ready to be submitted to bioconductor (see InterMineR on GitHub), and we’re considering how to implement OAuth to allow you to keep one account across many mines.

    We have a long list of potential data sets that we would like to add to HumanMine. These include regulatory data from the Encode project, drug target data from Drugbank and ChEMBL and a number of new gene/protein expression datasets.
  13.  

  14. Are there any data sources that HumanMine would have included if not for data licensing issues?
    Yes, we are no longer able to update the KEGG pathways data due to new licensing. There are other datasets that it would be nice to include such as the PharmGKB data, but we have such a long list of data that we can include that we won’t worry about that for now.
  15.  

  16. Who is the team behind HumanMine?
    This is the same team as InterMine/FlyMine. We have seven programmers, a mix of back-end database specialists and folks who work solely on the front end, developing tools. We also have a part time biologist who make sure HumanMine’s data are correct and accessible to our end users. And we’d be terribly remiss if we didn’t mention all our great community contributors who help out with our source code, mailing list, and our Google group, too!

Thanks to Yo and Rachel, for guiding us through this extremely useful and FREE tool. Be sure to check out their plugin in the plugin library. If you use HumanMine in your research, be sure to cite their publication:

InterMine: a flexible data warehouse system for the integration and analysis
of heterogeneous biological data.
Richard N. Smith1, Jelena Aleksic, Daniela Butano, Adrian Carr, Sergio Contrino, Fengyuan Hu, Mike Lyne, Rachel Lyne, Alex Kalderimis, Kim Rutherford, Radek Stepan, Julie Sullivan, Matthew Wakeling, Xavier Watkins and Gos Micklem. Bioinformatics. 2012 Dec 1;28(23):3163-5. doi: 10.1093/bioinformatics/bts577. Epub 2012 Sep 27. PubMed PMID: 23023984; PubMed Central PMCID: PMC3516146.

It’s open access so you can actually read up on the nitty gritty details of this great resource at your leisure!


Leave a Reply

Your email address will not be published. Required fields are marked *