Nathan Watson-Haigh PhD: Research Fellow in Bioinformatics at the Australian Centre for Plant Functional Genomics (ACPPFG) P/L. ACPFG is an Australian company that manages a large research program in cereals to deliver benefits to Australian breeders and growers.
How did you come to be working in bioinformatics?
I have been in the field of bioinformatics for 15 years, having started my PhD in 2001 focussing on the phylogenetics (evolutionary history) of all eukaryotes. This was where I was first exposed to programming and the Linux operating system. Since then, I have contributed to many open source bioinformatic projects, for example BioPerl, and I continue to submit bug reports and fixes to several GitHub projects. Over the years I have gained a breadth of experience in the handling and analysis of ‘omics data from the old microarrays to metabolomics data. Most recently, my work has revolved around high-throughput sequencing data generated by Illumina instruments. I’ve worked on projects centred around de novo genome assembly, transcriptome assembly and quantification, variant discovery and analysis, discovery of novel microRNAs.
Although I’m from a biological undergraduate background, over the years I have gained more and more expertise with Linux system administration. For example, I am now experienced with the application of cloud computing technology for addressing certain types of bioinformatic problems. I have also recently setup a Slurm cluster and use Ansible for automating the configuration of Linux machines hosted at ACPFG. Currently my skills focus on the informatics side of bioinfomatics, but I still have sufficient knowledge and understanding to talk effectively with different domain experts.
What are the challenges you see for life scientists in the data driven science era?
I see two major challenges for life scientists: 1) Good science starts with a sound hypothesis and then employs the correct technology to generate the data to test that hypothesis. It is now relatively easy to generate large datasets in a short space of time without careful consideration of how the data can be analysed to answer a specific question. 2) The size of datasets is ever increasing and it is often beyond the abilities of many life scientists to analyse alone. Life scientists are increasingly required to have the skills of a data scientist.
Would you say this is different for actual bioinformaticians? Do they face different challenges?
Bioinformaticians face similar challenges; expected to be both domain experts and to have a high level of understanding and skills in these areas. In a multi-disciplinary field such as bioinformatics, it is not possible to be an expert in everything. We bioinformaticians cannot expect life scientists to leave the lab behind completely to gain expertise in bioinformatics. Similarly, bioinformaticians cannot be expert biologists, computer scientists, software developers etc. It takes a team of bioinformatics, each with expertise in at least one of these domains to adequately address many of today’s scientific endeavours.
What is open data, and what does it mean to you?
Open data means data which is freely available for everyone to use without restriction. It is not a new idea but with the rise of the internet open data is now also more accessible than ever. Without access to open data there would be no opportunity to replicate/check another person’s analysis and/or extract additional value from an existing data set through analyses which differ from what was originally intended by the producers of that data.
What is currently missing in the field of bioinformatics AND life sciences?
Where do life scientists go when they have a mass of data to analyse and don’t know where to start? In Australia there is limited opportunity for people to start to learn the basics of bioinformatics and they often have to go overseas for immersive training courses. Such courses would, at a minimum, help attendees converse and ask the right sorts of questions when they consult with expert bioinformaticians. For some, this might be an opportunity they have been looking for to begin a transition from life scientist to bioinformatician.
It is early days yet, but what would you like to see EMBL-ABR become, achieve?
EMBL-ABR could help provide training opportunities to life scientists as well as provide a portal for showcasing Australian bioinformatic activities to the world. Ultimately, I’d like to see EMBL-ABR lead the way in a collaborative research project to demonstrate bioinformatics best-practice to the community.
Biosketch: Dr Nathan Watson-Haigh obtained his undergraduate degree in pharmacology from the University of Bath, England. He then completed his PhD in bioinformatics at the University of York, England, where he studied the deep phylogeny of the Eukaryotic tree of life. He has held postdoc positions at the University of Sheffield, England, and then at CSIRO in Queensland. It was in these postdoc positions where Nathan first started to grapple with ‘omics data-sets in the form of microarrays, analysing them within a systems biology framework. He then moved onto working for the Australian Wine Research Institute (AWRI) where he got his first real taste of next generation sequencing data. While at the AWRI, Nathan was heavily involved in the development and delivery of hands-on bioinformatics workshops, using the Australian Nectar Research Cloud infrastructure to train almost 1000 researchers over a 2.5 year period. Since 2012 Nathan has been working in the field of wheat bioinformatics at the Australian Centre for Plant Functional Genomics (ACPFG), focussed on making improvements to agronomical traits in cereal crops.