David J. Lynn, European Molecular Biology Laboratory (EMBL) Australia Group Leader at the South Australian Health and Medical Research Institute (SAHMRI), Associate Professor at Flinders University School of Medicine.
Open Science: what is it for you and why does it matter?
I think Open Science means different things to different people. For some it means being completely open at all stages of research. For me, this interpretation is probably a bit idealistic and I feel it is most important to be open with the end products. On the bioinformatics side of my group we produce a range of different software and online resources. All the code, data and the final publications associated with these efforts is open source and open access. For example, in the InnateDB project we have manually curated more than 30,000 molecular interactions of relevance to the innate immune system. This has taken nearly a decade of work and is a very valuable resource. All the data is freely available under a Design Science License. There are a number of reasons for supporting an open science approach in this area. Firstly, in most cases the software/resources we are developing at least partially build on the work of others, which, if they hadn’t been provided freely, we wouldn’t have been able to do. So, this is a case of paying it forward. Secondly, providing open code and data means that our resources are more widely used, improved by users reporting bugs, and can contribute to the development of new resources.
Open Science and Bioinformatics: is there a link?
Bioinformatics and bioinformaticians have been leaders in the open science movement in the life sciences. Right from the start, the field has been driven by open source principles and this has served the field very well. A more closed approach would have significantly impaired progress in bioinformatics, but also critically, would have impaired progress in the wide spectrum of research that is reliant on bioinformatics software and resources.
What makes sense to resource as a national effort when it comes to Open Science versus local resourcing/support?
I think it is really important that we do not reinvent the wheel in Australia and replicate international resources where they exist and serve the community well. Instead, I think we should focus on supporting Australian based resources, tools and datasets that are of national and international significance. Additionally, we should ensure that we have the appropriate infrastructure to access internationally significant datasets and resources where there are challenges to doing this at a local level (e.g. due to the scale of the dataset).
How would you recommend a novice biologist approach Open Science and where can they find guidance, resources and tools for getting on board?
I would recommend that a novice biologist begin incrementally. As mentioned the most important thing for me is that the final product is open and that the results are reproducible. If producing code, ensure you use an appropriate repository such as Bitbucket and make sure the code is well documented. Open Science is more mature in bioinformatics than, say, in my field of immunology. For a novice, I would recommend engaging in a manner appropriate to the field. I am a supporter of this movement but it is important not to be overly naïve either. It is important to protect intellectual property and to ensure that you and not others get credit for your work.
What are the top three actions/initiatives you would suggest biosciences domains to prioritise to enable Open Science, and what type of support do they need?
It is important that we create the appropriate incentives for open science. Right now, most assessment of scientific impact (e.g. for grants, promotion, etc) is based on the quantity and quality of publications. Researchers frequently do not formally cite the bioinformatics software they use. We need to ensure that high-quality, well-supported code, software and data are appreciated and considered when considering impact. We therefore need agreed metrics to do this and to weight them similarly to traditional citations. If done properly this will further incentivise, open, well-supported data and resources.
Funding bodies also have a major role to play by requiring open data, software, and reproducibility.
The Australian community also needs to engage in international data standards initiatives to ensure high quality, shareable data. I’m a member of the international molecular exchange consortium and have worked with EBI and others on the development of data standards for molecular interaction data.
EMBL-ABR is a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia. As such which is the role you see for EMBL-ABR when it comes to Open Science for Australian biosciences?
I see a number of potential roles for EMBL-ABR in supporting Open Science for Australian biosciences. First, as an advocate for Open Science both to the scientific community and to funding agencies. Secondly, to engage in international efforts such as data standards development and ensuring Australia has a seat at these tables. Thirdly, supporting the development and dissemination of nationally and internationally important Australian resources, software and data.
Are there bioscience discipline-specific limitations that required tailored solutions when it comes to Open Science rather than enough common denominators for share resources/tools and solutions?
Yes, I think bespoke solutions will be needed for specific bioscience disciplines. Different disciplines are at very different stages when it comes to Open Science practices so what is suitable for the bioinformatics research community may be too far for an immunologist, say.
The one thing you would like EMBL-ABR to do in the future when it comes to connections with existing international efforts is…
Support and encourage Australian researchers to engage with international efforts. EMBL-ABR will not have the capacity to participate directly in all the major international initiatives so it should identify key people in the Australian community with an interest in being involved and ensure these people can get involved.
What is the best example you can think of when it comes to Open Science in the biosciences and did bioinformatics play any role?
Without a doubt it must be the human genome project. Bioinformatics had a major role in the analysis, interpretation and dissemination of this critical project which has been the foundation for modern genomics.
Biosketch: Since March 2014, David has been an EMBL Australia Group Leader at the South Australian Health and Medical Research Institute. He is also an Associate Professor at Flinders University School of Medicine. David’s group is a multi-disciplinary group that is equally divided between computational and experimental systems biology. On the wet-lab side, his group employs in vitro and in vivo experimental and clinical models coupled with systems biology approaches to investigate the interplay between the microbiome, vaccines and the immune system. On the bioinformatics side, his group leads the development of a range of open source software and resources including InnateDB, an internationally recognised systems biology platform for innate immunity networks (10,000 users worldwide; >450 formal citations). David also leads the computational biology aspects of €12 million European Commission funded project called PRIMES, which is investigating how to model and subsequently therapeutically target protein interaction networks in cancer. Both InnateDB and PRIMESDB are members of the International Molecular Exchange Consortium and David has worked as part of the PSI-MI group to develop data standards in this area.