Open Science: an interview with Graham King

Prof Graham King is the Director of the Institute for Innovative Agriculture at Southern Cross University in Lismore, NSW and is Head of our twelfth node, the EMBL-ABR: SCU Node.


September 2017

Open Science: what is it for you and why does it matter?

Ideally, Open Science should represent the default position for researchers and be integral to the scientific method, which works most effectively when peers have opportunity to reproduce, refine or challenge results generated by others. For me, Open Science is analogous to Open Source, with all the collateral benefits of low entry cost, resilience, flexibility, adaptability, scope to add value, alongside accountability and persistent attribution. In the 21st century, the information universe is a scalable and noisy environment.  Open Science can contribute to ensuring validated and transparent processes and resources are available to increase the signal to noise ratio.

Open Science and Bioinformatics: is there a link?

Definitely. Firstly, for navigating the data universe in ‘natural history’ mode, the tools developed within the bioinformatics sphere are key to compiling and enabling findable, accessible, interoperable and re-usable (FAIR) data. This can empower researchers to explore interconnections and develop hypotheses. More sophisticated bioinformatics tools may then be used to generate and use algorithms for hypothesis testing, visualisation and interpretation.

What makes sense to resource as a national effort when it comes to Open Science versus local resourcing/support?

Ensuring there is capability to combine disparate data sources where the whole represents more than the sum of the parts. The ability to ensure valuable and nationally-unique datasets are curated and able to persist beyond project duration. With limited resources there is a need to identify attributes that distinguish a national approach vs contribution to international effort. This leads to identifying Australian-specific data or scientific endeavours. This may be driven by factors such as species distribution, agricultural production practices, or specialised beacons of research excellence.

How would you recommend a novice biologist approach Open Science and where can they find guidance, resources and tools for getting on board?

I would encourage them to seek out leading or unique reference datasets that may be relevant to their topic area, not to be constrained by how ‘flashy’ the presentation, but develop an understanding of the quality – determined by information content, meta-data provenance and attribution to primary sources, uniqueness, completeness and relevance.  Moreover, there are always contributions that can be made by individual scientists in enhancing the quality or value of existing datasets – this may be by additional annotation leading back to peer-reviewed literature, or compiling and linking data from disparate sources.  First steps in genomics and some related ‘omics would be to drill down in resources such as EBI, NCBI and CyVerse.

What are the top three actions/initiatives you would suggest bioscience domains prioritise to enable Open Science, and what type of support do they need?

One of the key requirements for adding value to existing data and information is the generation and adoption of data standards. This is particularly relevant for the large number of ‘minor’ species-specific research communities outside of human and model-species systems and for a wide range of entities, not just bio-samples and sequences. Deep biological insights come from comparative approaches, but this can be challenging if one does not understand the equivalence between nomenclature (or lack thereof) attributed to entities for different species. This requires road-tested templates, understanding of knowledge representation systems such as ontologies, and training.

Related to this is the requirement for the generation and registration of world-wide entity unique identifiers that can be used and understood by each researcher/institution worldwide. This requires establishment of automated registry services that provide unique ‘tokens’ upon request and then are promulgated or available for lookup throughout the research community.

Raising awareness amongst research managers/funders that Open Science is cost-effective and profitable and leads to tangible economic benefits. At present there is an imbalance in the support and acknowledgement of the role bioinformatics and Open Science can play outside of medical research and model species R&D. Improved connectivity and exchange between discipline areas is more likely to occur with recognition of commonalities of approach, with investment in specific complementary areas of expertise.

EMBL-ABR is a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia. As such which is the role you see for EMBL-ABR when it comes to Open Science for Australian biosciences?

EMBL-ABR is in a strong position to show leadership and provide a go-to point of access, and would best serve the Australian Biosciences by focusing on areas where Australia does/can excel, and align with national strategic research priorities. A key role is in training and raising awareness of what is available to researchers at all levels both within and outside Australia. In addition, there is scope to develop a number of flagship areas to showcase the benefits of implementing Open Science pipelines. As an example, in the agricultural sector such benefits may be in economies of scale – bioinformatics pipelines or networks developed for major crops being applied to high value Australian crops that can address a growing export market. The same infrastructure can then attract overseas researchers for collaboration and training in Australia.

Are there bioscience discipline-specific limitations that require tailored solutions when it comes to Open Science rather than enough common denominators for shared resources/tools and solutions?

Yes. This can occur for several reasons, one of which is scale of investment, where specific attributes of a biological system or bio-industry may not be met by an off-the-shelf solution and there be insufficient funding to re-invent the wheel. This can be tackled by identifying or allowing consortia to emerge, pooling expertise and solutions.  A second area is where a specific biological discipline interfaces with another discipline – such as human nutrition and food composition, plants and soils, any species and unique ecological niches. There is scope for wider adoption of Open Science in opening up awareness of issues such as tracing and attributing sources of phenotypic variation; understanding the contribution of soil status via crops and livestock to  human nutrition; and, data-mining historical and incomplete datasets.

The one thing you would like EMBL-ABR to do in the future when it comes to connections with existing international efforts is..

Sweat our AU$, fly the flag and showcase Australian science!  Work closely with Cyverse and Elixir. Please don’t re-invent the wheel or duplicate investment elsewhere. Help identify areas where Australian researchers can lead in setting and showcasing the benefits of Open Science, including establishment of international standards for a wide range of biological entities.

Added to that, I would recommend disseminating information to Australian researchers, and assisting in access to international resources and communities where appropriate.

What is the best example you can think of when it comes to Open Science in the biosciences and did bioinformatics play any role?

The best examples are still in our dreams. However, the first that comes to mind is the human genome sequencing project, where bioinformatics played a key role and set the benchmark for many other efforts, as well as demonstrating to the world the value of openness.


Biosketch: With over 30 years’ post-doctoral research experience in crop plant genetics, genomics, conservation and pre-breeding, Prof Graham King has led national and international research consortia involving data management and downstream informatics, and continues to work closely with a range of end-users. As part of a wider effort to facilitate integration and navigation of data between phenotype and genome his group developed the open-source CropStoreDB framework, and they work closely with international groups to extend functionality and interoperability for a range of crop species. He leads SCU efforts within the international Divseek consortium, including membership of working groups on semantics for harmonising trait data, interoperable tools and translational approaches for minor crops. Graham is currently chair of the Multinational Brassica Genome Project steering group, and holds honorary positions at University of Nottingham (UK), Huazhong Agricultural University (China) and Crops for the Future (Malaysia).