taxonomic rank

In a typical query, you can query genes for one or multiple species by providing a “species” parameter:,10090 now allows you to query genes at the level beyond species, that is, you can now query for matching genes for any given genus, family, or even phylum from the taxonomy tree (effectively any node from the tree). For example, you can now query for “lytic enzyme” in any firmicutes (gram-positive bacteria, taxonomy id: 1239): enzyme&species=1239&include_tax_tree=true  

or, in Python:

mg = mygene.MyGeneInfo()  
mg.query('lytic enzyme', species=1239, include_tax_tree=True)  

Note that include_tax_tree=true parameter toggles the query against all taxonomy ids under 1239 node in the taxonomy tree (including 1239 itself). As comparison, the query without this parameter: enzyme&species=1239  

will return empty hits, as no genes are annotated at the level of firmicutes.

We expect this new feature will be particularly useful for the fields like evolutionary biology and microbiome. In fact, this was a requested feature from our users in those fields. So, please give a try and let us know your feedback.

The exact usage of this feature is summarized below:

  • species parameter accepts one or multiple taxnomony ids (multiple ids separated by commas)

  • passing include_tax_tree=true expands the query against any sub-nodes of passed taxids (from species parameter) in the taxonomy tree, including the taxids themselves.

  • Since you can pass any taxonomy id from the taxonomy tree, we will cap the expanded taxonomy id list to 10,000, so that it won’t overload our servers.


This feature is made possible through the project led by Greg Stupp at recent 2nd NoB Hackathon.