Open Science: an interview with Jeff Christiansen

Jeff Christiansen, Health and Life Sciences Program Manager, QCIF and our interim Open Data Coordinator, EMBL-ABR

July 2017

_______________________________________________

Open Science: what is it for you and why does it matter?

Open Science for me is carrying out research, in any discipline, with some degree of openness surrounding the inputs, the methods and tools and the outputs. That degree of openness can range from relatively rare ‘extreme’ cases where research is conducted in a completely open and on-line (almost public) fashion, through to the more common approach where once research results are published, the inputs, methods, tools and outputs are made openly available for others to use (and build upon). Scientific reproducibility is of critical importance, and provision of the above components in an open manner ensures that all the building blocks are available to enable others to reproduce the results.

There can be – bioinformatics can be conducted in an open fashion – by making sure input data, tools, and workflows are appropriately described, versioned, made available via a publicly accessible repository, and clearly licensed to make it clear how they can be re-used. Having been exposed to a wide range of other research disciplines, biology and bioinformatics research generally has a relatively open culture – grown from the early days of gene sequencing when it was realised that without open sequence databases it was impossible to undertake comparative analyses on new sequences. Without that initial co-operative and open approach to data and tool sharing, bioinformatics as a discipline would never have developed as quickly as it has.

What makes sense to resource as a national effort when it comes to Open Science versus local resourcing/support?

Thanks to rapid technological advances in many areas, research is increasingly undertaken using data that is “born digital”, and then conducted in a connected world. The whole ecosystem of data, tools and compute that is used in research is very complex and spans not only national and local resources but also many international and also commercial resources. I believe it makes sense to resource national efforts to increase/improve local infrastructure and efforts to better connect Australian researchers into this global ecosystem. From a moral standpoint, one can also argue that any inputs/outputs/methods related to research that is funded by the public purse should be made openly available to the public. Funding bodies are increasingly encouraging researchers to share in this way, but dedicated funds for building systems to enable researchers to do this in an efficient manner would be welcome.

How would you recommend a novice biologist to approach open science and where can they find guidance, resources and tools for getting on board?

I would recommend that one should first think about developing systems and methodologies to describe experimental inputs, methods and outputs that enable this person to share this information with their ‘future self’ – it will make life much easier in the long run to know what worked and what didn’t. From there the natural progression is that this will then make the sharing of any of the above much easier with members of their lab, then project collaborators outside the lab, and ultimately the public following publication – incrementally increasing the openness of the information with others. In the current digital age, the biologist also probably has a large amount of data, and will also need to make this available in appropriate formats for analysis tools so thinking about data standards that would allow it to be shared with machines, software or tools is another natural extension of this.

As far as guidance, there are many resources online including blogs, which are a friendly way to start. Just search for ‘open science data’ or ‘open science tools’.

What are the top three actions/initiatives you would suggest biosciences domains should prioritise to enable open science, and what kind of support do they need?

  1. The carrot: To reward/recognise researchers for sharing data, methods and tools openly – recognising these as first-class research outputs in themselves and continuing to develop or endorse systems where data, methods and tools can be cited and used in citation metrics.
  2. One stick: Funders to require data, methods and tools are made openly available at the point of publication, or at appropriate times during a research project to enable release of further funds.
  3. Another stick: Publishers and reviewers to require data, methods and tools are made openly available in appropriate formats and persistent repositories at the point of publication.

EMBL-ABR is a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia. As such which is the role you see for EMBL-ABR when it comes to Open Science for Australian Biosciences?

Acting as a conduit between international best practice efforts in Open Science and the Australian biosciences community, and educating that community about best practice and resources available to enable sharing/exchange of information in an open manner.

Are there bioscience discipline specific limitations that required tailored solutions when it comes to Open Science rather than enough common denominators for the sharing of resources/tools and solutions?

The broad issues of conducting open research are the same in any discipline (including non-science endeavours) so many broad generic methods/tools can be applied (e.g. using openly available data/code repositories, licensing frameworks that can be applied, using non-proprietary formats and protocols). The most obvious situation where discipline-specific solutions are required is in the health and medical research arena where privacy protection of research participants is paramount. However, if de-identification is conducted according to robust, best-practice procedures, a degree of openness is still possible whilst protecting the privacy of each individual.

The one thing you would like EMBL-ABR to do in the future when it comes to connections with existing international efforts is…

… ensuring the Australian biosciences community has a seat at the international table, and that Australia can support these international efforts appropriately through contributing appropriately.

What is the best example you can think of when it comes to Open Science in the biosciences and did bioinformatics play any role?

The OpenSourceMalaria OSM Project http://opensourcemalaria.org is aimed at finding new medicines for malaria and is guided by open science principles – everything is open and anyone can contribute.

________________________________________________________________

Biosketch: Jeff is based at the EMBL-ABR QCIF Node, and is QCIF’s Health and Life Sciences Program Manager. He has a PhD in Biochemistry from the University of Queensland, and started his career at the bench conducting research in the fields of cancer, molecular genetics and embryo development in both Australia and the UK, prior to moving into the management of large biological data assets (gene sequence, images, etc.) through the establishment of EMAGE, a UK-based international database of gene expression and anatomy. At this point, having to deal with hundreds of thousands of database entries for several thousand genes and millions of images, and to be able to exchange this information with collaborators led him into the world of biocuration where he became an inaugural member of the International Society for Biocuration (ISB). After returning to Australia several years ago, he has held positions at both the Australian National Data Service (ANDS) and Intersect Australia where he has worked on many data-centric projects in the biology and health and medical arenas.