What is a data commons?

datacommonsOn June 7 HealthcareITNews led with the headline, Joe Biden unveils precision medicine database at University of Chicago. The report documents how a new Genomic Data Commons is pulling data from US National Cancer Institute programs of over two petabytes of cancer genomic datasets, making it accessible to a wider research community. EMBL-ABR’s interest is in finding how we can build links to such initiatives for Australian researchers and their data – starting first with linking in to agreed standards and protocols for such data. And this is why life scientists need training in data curation, annotation and dissemination.

A similar but smaller project is the NIAID’s  Microbiome Commons – Nephele* – a platform for co-locating data and computing on cloud environments to create a place for microbiome researchers to bring their own data to this platform and perform analysis there.

NIH’s Big Data to Knowledge (BD2K) program takes this work even further. Their Data Commons project is tasked with approaching multiple NIH Institutes and centres to seek out the most highly-used digital artefacts (ie. all digital artefacts generated during NIH-funded research, including data, tools, workflows, documents, etc). From this ‘stocktake’ they house these artefacts in repositories and libraries where they are more accessible and available to researchers: on the cloud, through APIs or through publicly available indexes. So from this data commons, for example, a researcher wishing to access a particular dataset might also find tools which have been used by others to help them interrogate that same dataset – thereby cutting out some of the duplication going on across research projects currently.

Dr Vivien Bonazzi, Senior Advisor for Data Science Technologies & Innovation, Office of Associate Director for Data Science, NIH and member of the EMBL-ABR International Science Advisory Group leads the team developing this ‘Commons’ to support biomedical discovery by enabling sharing of digital objects. Link here to view Vivien’s latest presentation explaining the Data Commons – and our thanks to her for sharing this and other insights from her work at NIH.

Of further interest to Australian medical researchers will be the Precision Medicine forum being held in London over 11-12 July. The forum is being chaired by Vivienne Parry OBE, science communicator and head of engagement at Genomics England which is delivering the 100,000 Genomes Project. Their stated challenge, resonates with our own, as bioinformatics and international collaboration is fundamental to progressing such projects:

If the ambitious goal of Precision Medicine is to be achieved, it requires the creative and energetic involvement of many; from biologists, physicians and technology developers to data scientists, patient groups, governments and more. Interest in the initiative’s goals has motivated and attracted visionary scientists from many disciplines but, for these efforts to be truly successful, they must cover all corners of the globe and will require significant collaboration.

EMBL-ABR is well positioned to act as the Australian touchpoint for such activities and we are working hard to see Australian life scientists are able to benefit and contribute to them as much as possible.


*Further background on Nephele is in this Genome Web article or see the original poster as presented in May this year at the 15th Annual World BioIT World Conference and Expo: Nephele: A cloud-based scientific computing platform for improved efficiency, standardization, and collaboration in microbiome data analysis.

More reading: Why does the NIH want to change incentives for researchers?