EMBL-ABR: an interview with Michael Hoffman

Michael Hoffman, Principal Investigator, Princess Margaret Cancer Centre and Assistant Professor, Departments of Medical Biophysics and Computer Science, University of Toronto.

March 2017

Open Science: what is it for you and why does it matter?

“Open science” describes a wide variety of practices. For me, it’s most important that when science is published, the publication itself is available to all for free, and all the data and code that support the publication are available as well. Academic scientific research is usually funded by the public or by charitable donations, and scientists have a responsibility to disseminate the results of that research so that it has the broadest impact. This means making publications open access rather than hiding them behind expensive paywalls. It also means making data and code available too so other scientists can ensure your work is reproducible and can build on it. Most of us want other people to build on our work. That is how science progresses.

Open Science and Bioinformatics: is there a link?

Bioinformatics has been in the vanguard of open science for many years. In particular, those developing bioinformatics methods often rely on freely available data, and those analysing data often rely on freely available methods. The speed of progress in bioinformatics has only been possible due to the availability of open data and methods.

What makes sense to resource as a national effort when it comes to Open Science versus local resourcing/support?

It can be very effective to have national-level efforts specifically for promoting free software and data resources. Individual labs often produce useful resources as part of conducing scientific research. So it’s good to have support for–and require–open source and open data from publicly funded researchers at all levels.

How would you recommend a novice biologist to approach open science and where can they find guidance, resources and tools for getting on board? specially if not in Canada…

I think one of the most important things when it comes to posting your code and data is to just do it! People often postpone posting their code and data until they’ve found the perfect way to do it, and it’s just more important to do it, sooner. Often you can improve things later if you need. I recommend using Zenodo for posting data and Bitbucket for sharing code.

What are the top three actions/initiatives you would suggest biosciences domains prioritise to enable open science, and what kind of support do they need?

The most important thing is that open access publications should be required with accompanying posting of code and data. Journals should require it, peer-reviewers should insist on it, and funders should only credit papers where this is done. That will give everyone the maximum incentive for open access publication, code, and data, and there won’t be an advantage in cutting corners.

EMBL-ABR is a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia. As such, what role can you see for EMBL-ABR when it comes to Open Science for Australian Biosciences?

Organisations like EMBL-ABR can help scientists make their code and data more open to everyone. Providing an easy way for people to upload and distribute their scientific materials without worrying about the implementation details is essential.

Are there bioscience-specific limitations that need tailored solutions when it comes to Open Science? 

Yes, the needs of individual disciplines can very hugely, and needs can even vary within biological sciences–consider genomics versus the imaging done in neuroscience.

The one thing you would like EMBL-ABR to do in the future when it comes to connections with existing international efforts is…

…bring strong advocacy for open publications, code, and data.

What is the best example you can think of when it comes to Open Science in the biosciences and did bioinformatics play any role?

I think the open data sharing that started with the Human Genome Project and continued with subsequent large-scale genomics efforts like ENCODE and 1000 Genomes has been transformative for biological research. Much of it was driven by pioneering bioinformatics researchers who understood how much more we can get done together when we share.
Biosketch: Michael Hoffman creates predictive computational models to understand interactions between genome, epigenome, and phenotype in human cancers. He implemented the genome annotation method Segway, which simplifies interpretation of large multivariate genomic datasets, and was a linchpin of the NIH ENCODE Project analysis. He is a principal investigator at the Princess Margaret Cancer Centre and Assistant Professor in the Departments of Medical Biophysics and Computer Science, University of Toronto. He was named a CIHR New Investigator and has received several awards for his academic work, including the NIH K99/R00 Pathway to Independence Award, and the Ontario Early Researcher Award.