Outbreak.info resources: a tool to aid COVID-19 research
In the midst of the COVID-19 pandemic, one of the few bright spots for me personally has been seeing the scientific community rally together to address this challenging disease. Researchers from around the world have quickly shifted research to focus on understanding the virus, the disease, and its spread.
More than tackling these difficult questions, the community has embraced open data and open science, sharing results with the public in near real-time. Every day, new datasets, discoveries, and methodologies are shared publicly, allowing other researchers to take advantage of each other’s findings and potentially accelerate the pace of research.
Of course, this unprecedented openness also comes with its own sets of challenges. Presenting hypotheses and conclusions-in-progress is unnatural from the typical pace of biomedical research, and the community is still figuring out the best ways to share our insights and data outside the confines of a traditional publication. More than anything else, though, it’s a challenge to keep track of the deluge of stuff that appears on COVID-19 on a daily basis. Since April 2020, we have tracked 1,000 – 2,000 new resources that appear every week:
As a result, we created outbreak.info resources, a searchable collection of over 79,000 heterogeneous resources on COVID-19 and SARS-CoV-2. Unifying metadata from over 12 different sources, we have collected metadata on publications, clinical trials, datasets, protocols, and more:
The real power of this resource collection is its ability to unite together disparate resources into a single location. When the U.S. Food and Drug Administration authorized remdesivir as the first treatment for COVID-19 under the Emergency Use Authorization, I was curious to find how much evidence there was for its efficacy. I started with where I begin most searches: firing up google.com and seeing what comes out. In this case, though, that’s a bit non-specific, mixing publications, news articles, and press releases. Since I know I’m interested in clinical trial data, I searched ClinicalTrials.gov to find all clinical trials registered in the United States. But what about the rest of the world? To get those as well, I had to go to the World Health Organization’s International Clinical Trials Registry Platform, download a .csv file of all current COVID-19 trials, and then search for remdesivir to filter the results. Okay, that must be most of the clinical trials on remdesivir… but what about publications? Litcovid, a curated hub of COVID-19 publications in PubMed, is a good place to start, but researchers are also widely using pre-print servers like bioRxiv and medRxiv. That should cover publications pretty well, but what about datasets, protocols, and other resource types?
Nine tabs later (plus many other clicks to find the COVID-19 specific pages for each source), and I’m just getting started. In general, I find the state of clarity of my mind is reflected with how many tabs I have open in Chrome: the more tabs I have opened, the more cluttered I feel. Surely, there must be a better way to try to find information.
outbreak.info combines together all these sources and more into a single interface, so that instead of polluting my browser with a million tabs, I can simultaneously search clinical trials, publications, datasets, and protocols to find resources on remdesivir. I can then further filter down these results to explore only datasets or clinical trials, resources that have been published in the last week, or things that were funded by the National Institute for Allergy and Infectious Diseases.
If I select a particular record, like clinical trial NCT04280705, I can view its metadata, which describes all the information about the setup of the trial, including its design, outcomes, status and more. Additionally, I can follow the link to view the data from its original source.
Wherever possible, we have linked together any resource with related resources; for instance, for this clinical trial, we have also provided a link to the published preliminary results from the trial, published in The New England Journal of Medicine:
The magic behind outbreak.info lies in the rigorous schema we’ve developed to categorize these metadata. When combining together a bunch of different resources and resource types, they all describe data in similar but slightly different ways. For instance, ClinicalTrials.gov refers to the minimum age to be included in the trial as a field “MinimumAge” which is part of an “EligibilityModule” variable, while the WHO uses the variable “Inclusion agemin”. It’s a simple but slight difference — but it means that if you were trying to search for trials that included children, you would have to search both “MinimumAge” and “Inclusion agemin” to find all the trials. Not only is this a problem for comparing resources of the same type, but it gets more complicated when you try to mix resources of different types, like clinical trials and datasets. For the resources in outbreak.info, we have standardized the way we describe these resources using reusable schemas built off of the work of schema.org. If you’re keen to learn more, my colleague Ginger Tsueng delves deep into the power of schemas and explains how they can be recycled and extended.
Lastly, if you want to save the results for later or analyze them in another project, we have provided access to all the metadata we have standardized in easy-to-use formats. You can download all the data as .tsv or .json files directly from the website, the results of a particular search, or you can directly access the information programmatically through our API.
outbreak.info is a diverse, heterogeneous collection of resources that makes it easier to keep track of the thousands of resources that are generated each week on COVID-19 and SARS-CoV-2. The data is updated daily and combines publications, clinical trials, datasets, protocols, and more in a standardized searchable interface. In addition to exploring the data on the website, all the metadata can be exported for future analysis or accessed through our high performance API. The pandemic has taken an enormous toll on lives and livelihoods, but it has also propelled researchers to share their findings in an unprecedented manner. Although this level of openness is not without challenges, we hope our work helps to address some of these challenges so that researchers can help to finally bring an end to this pandemic.
Do you have feedback on features you’d like for us to include in outbreak.info, or resources you think we should add to our collection? Send us a message at helpoutbreakinfo (helpoutbreakinfo) !