Open Science: an interview with Stephen Eglen

Stephen Eglen, Director, MPhil in Computational Biology, University of Cambridge

February 2017


Open Science: what is it for you and why does it matter?

“Open Science” is one of those terms that I hope will soon disappear, much like “big data”. The word “Open” prefixes “Science” indicates to me that the researchers are freely sharing the research artefacts underlying their findings, e.g. data, code, workflows. Science is often seen as a competition, such that researchers are unwilling to give away such artefacts for fear that they will lose out, or for fear of their results being found incorrect. Hence, most science today is not open. However, there is much to gain by becoming more open. It allows for ready verification/re-use/extension of your finding, allowing for others to build on your work. I think there is also the moral case to be made — much of science is funded by taxpayer funds, and as such I believe the findings should be publicly available. So, hopefully in a generation, we’ll drop the term “Open” as it will be redundant — by default, science will be open.

There certainly is; my main research area is computational neuroscience, where open scientists are relatively rare, but with notable exceptions e.g. Erin McKiernan and Nikolaus Kriegeskorte. By contrast, I naturally think of much of Bioinformatics being relatively open. For example, in many cases when you read papers with a bioinformatics component, the authors have made their code and data freely available. I think this is a natural consequence of open approaches to genomics, see below.

What makes sense to resource as a national effort when it comes to Open Science versus local resourcing/support?

See my comments on this below.

How would you recommend a novice biologist to approach open science and where can they find guidance, resources and tools for getting on board?

Go find a local group of open scientists – they have a habit of being both open and friendly! If nothing else, find a local users R group and my guess is you’ll find a friendly biologist or two to offer guidance. The List of R user groups currently lists eight in Australia and two in New Zealand (the home of R).

What are the top three actions/initiatives you would suggest biosciences domains should prioritise to enable open science, and what kind of support do they need?

  1. Provide training in open science activities e.g. Data Carpentry.
  2. Give credit wherever possible to open scientists. e.g. recognise publishing of preprints in applications, alongside traditional published papers.
  3. Work with funders to mandate open access elements are included in grant applications.

EMBL-ABR is a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia. As such which is the role you see for EMBL-ABR when it comes to Open Science for Australia Biosciences?

I am cautious of adding more infrastructure (hardware, databases, websites) to support open science activities at a national/institutional level. I would hope instead that EMBL-ABR adopt and support existing international infrastructure wherever possible. It is tempting to provide new infrastructure, and might even be possible to get funding to establish new resources. However, long-term maintenance of these resources is a concern, and so I think it is better to pool resources with other current approaches. If you feel something is missing, lobby EMBL/EBI to provide it. Repositories like Zenodo and Figshare can archive modest-sized artefacts free of charge and provide stable DOIs.

Are there bioscience discipline specific limitations that required tailored solutions when it comes to OS rather than enough common denominators for share resources/tools and solutions?

In many cases, once the data are stored digitally, then bits and bytes are all the same irrespective of the scientific domain. However, there are at least three particular issues in the biosciences. First, when dealing with human data there are issues with privacy/anonymity. Second, but not unique to biosciences, datasets are large and may require considerable storage requirements. Third, many areas of research in bioscience are relatively new, and data are quite heterogeneous. Standards data formats are lacking in many areas. All of these points may lead to bioscientists adopting tailored solutions.

The one thing you would like EMBL-ABR to do in the future when it comes to connections with existing international efforts is…

… consider ways to support them in the long-term, perhaps by prioritising these international efforts over creating local solutions.

What is the best example you can think of when it comes to Open Science in the biosciences and did bioinformatics play any role?

I think anyone new to open science should read about the human genome wars underlying the publication of the first sequencing and analysis of the human genome. This is described concisely by a commentary from the manuscript handling editor of the Science publication [1]. This dramatic episode demonstrates the need for scientists to work with funding agencies and journals to establish core principles in open science.


[1] Jasny BR (2013) Realities of data sharing using the genome wars as case study – an historical perspective and commentary. EPJ Data Sci 2:1. DOI


Biosketch: Stephen has a joint honours BSc in Cognitive Science, Computer Science and Psychology from the University of Nottingham (1993) and a DPhil in Computer Science and AI from the University of Sussex (1997). He has held two fellowships form the Wellcome Trust which took him to Edinburgh and Washington University in St Louis to work with David Willshaw and Rachel Wong on developmental problems in neuroscience. Since 2004 he has been on the faculty in Applied Mathematics at the University of Cambridge where he is currently the director of the MPhil in Computational Biology. He has a long-standing interest in open access, and more recently, open science.