Saravan Dayalan, Lead Scientist (Bioinformatics, Biostatistics), Metabolomics Australia, The University of Melbourne, Acting Head, EMBL-ABR MA Node, Standards Key Area Coordinator, EMBL-ABR
Open Science: what is it for you and why does it matter?
All aspects that make up Open Science such as data, results, protocols, software, pre-publication manuscripts and post-publication manuscripts are equally important. For me, coming from a background of computer science and statistics, openness in two specific areas are of additional significance. First is the openness of software algorithms that are used to generate results. This highlights the importance of using freely available open source software instead of commercial solutions. Insight into the workings of algorithms is imperative in any attempt to reproduce the results generated by those algorithms.
Second is the openness of the data that was used to generate results. This includes all forms of data – from the instrument-generated raw data to post-processed data including a record of all the parameter values and settings used by the software to process the original raw data. With the vast amount of high throughput data generated in life sciences, every stage of the data transformation needs to be recorded in order to successfully replicate the analyses.
Why does all this matter? Reproducibility in science!
Open Science and Bioinformatics: is there a link?
The quantity of data generated both in general as well as in life sciences is growing exponentially. A 2016 report from IBM Big Data says that 90% of the data in the world today was generated in the previous two years alone. Software solutions are the only means by which this mountain of data can be analysed and transformed into knowledge. Bioinformatics with its novel methods and solutions in turn is the only way to analyse today’s life sciences data which in turn makes bioinformatics an integral part of Open Science.
What makes sense to resource as a national effort when it comes to Open Science versus local resourcing/support?
Thinking about this in a top-down manner, it makes sense to have a platform of information and data exchange on a national level. Researchers would use such a centralised platform or platforms to upload their methods and protocols, software codes, data attained at all levels and pre-publication manuscripts. Some examples of such platforms are figshare, arXiv and domain-specific repositories such as PRIDE and MetaboLights. Such a centralised setup naturally facilitates a more widespread exchange of information between researchers.
Where local resourcing is more advantageous is on the expertise level. Life science researchers would certainly benefit from the close proximity of bioinformaticians and biostatisticians. Having said that, what EMBL-ABR has done in bringing together the different local expertise under one umbrella is an ideal way to disseminate those local expertise to a broader audience.
How would you recommend a novice biologist approach Open Science and where can they find guidance, resources and tools for getting on board?
Before talking about resources, I would like to discuss attitudes towards Open Science. One of the most common roadblocks in attaining openness in science is around freely sharing data. Researchers are reluctant to share their data, not just before publication of their work but even after. I would recommend both novice and young biologists try to overcome this reluctance by investigating how existing methods such as DOIs for data, guarantee ownership to the author. Then seek advice from other researchers who have been down this path. Indeed, publishing data in repositories such as figshare right after the experiment is done can in many ways help to speed up the manuscript writing. It is, after all, not uncommon for young researchers to have their work and draft manuscripts left in limbo while waiting for feedback for months if not years, thus increasing the chance of their work becoming obsolete.
As an individual researcher, the best place to seek guidance would be the local community (within universities or an external body such as EMBL-ABR). As a field of Biosciences, we need to look up to physicists who have been doing open science for decades now.
What are the top three actions/initiatives you would suggest biosciences domains to prioritise to enable Open Science, and what type of support do they need?
1. Publish data along with the findings. This is a crucial first step in open science. Publish results in open access journals, such that your findings are freely available to everyone.
2. Use open source software to assist with your findings. Most commercial software is contained in black-boxes.
3. Openness in science is every researcher’s responsibility. Take an active role in educating your peers (both junior and senior researchers) in the advantages of Open Science.
EMBL-ABR is a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia. As such which is the role you see for EMBL-ABR when it comes to Open Science for Australian biosciences?
I see EMBL-ABR as the forum that could be used to propagate Open Science in Australia. EMBL-ABR is well positioned for this task with its collation of wide ranging expertise from across the country. With the different key areas of Data, Tools, Training, Platforms, Compute and Standards, work around Open Science would fit in naturally within the remit of EMBL-ABR.
Are there bioscience discipline-specific limitations that require tailored solutions when it comes to Open Science rather than enough common denominators for share resources/tools and solutions?
One aspect that would be specific to individual domains within biosciences is the work around Standards. It is to be expected that different domains would define data and meta data differently and hence have varied requirements for Standards. So as long as any domain specific works link back to the overall efforts of Open Science, these segregated works would not be a disadvantage for openness in science.
The one thing you would like EMBL-ABR to do in the future when it comes to connections with existing international efforts is…
EMBL-ABR certainly needs to be part of international efforts and actively participate through contributions. The ongoing international relationship is important as Australia has the disadvantage of distance and time separation from Europe and US where the majority of large-scale efforts are initiated. Where possible, Australia and EMBL-ABR should aspire to initiate projects and efforts where it plays a leadership role on the world stage.
What is the best example you can think of when it comes to Open Science in the biosciences and did bioinformatics play any role?
An example closer to me is the Protein Data Bank and the tens of thousands of protein 3D structural data deposited in it. My entire PhD was possible because of open access to those structures. Scientific software and bioinformatics methods certainly play a role not just in putting together the 3D structures of proteins but around solving the age-old problem of predicting the 3D structure of a protein from its amino acid sequence.
Biosketch: Saravanan Dayalan is the Lead Scientist (Bioinformatics, Biostatistics) of Metabolomics Australia, The University of Melbourne. His interest is Big Data in Biosciences – from managing this mountainous data through large-scale software solutions to using Computer Science and Biostatistical approaches to extract knowledge from data. He is a strong believer in “Progress through Collaboration” and is part of several national and international efforts. On the national front, Saravanan is the Acting Head of the EMBL-ABR MA Node and the Standards Key Area Coordinator for EMBL-ABR. On the International front, he is a member of the Data Standards Task Group of the Metabolomics Society that is tasked with establishing standards for the field of Metabolomics. He is also a member of the international mzTab working group, member of the Pool of Experts of Standards for the Global Organisation for Bioinformatics, Learning, Education and Training (GOBLET) and also leads the international MASTR-MS consortium, a group of 8 national and international universities with an interest in a large scale scientific data management solutions.