EMBL-ABR Network: an interview with Mark Ragan

Mark Ragan: Professor and Co-head, Genomics of Disease and Development, Institute for Molecular ragan_webBioscience, and Professor (Adjunct), School of ITEE, The University of Queensland

April 2016

___________________________________________________________

What does bioinformatics mean for you?

Bioinformatics is a tool kit which lets me do biology. I’m one of those people who distinguishes between bioinformatics and computational biology. Computational biology is addressing applied or theoretical biological questions using computational, mathematical, statistical, computer science or informatics techniques whereas bioinformatics is the development and implementation of algorithms, programs and websites that let you do that in a civilised way.

What are the challenges you see for life scientists in the data driven science era?

Biosciences is a data science now; we’ve been saying it’s going to be soon, but it is already and it has been for a couple of years. I think the biggest issue is training – training and skills and putting all these required techniques together for individual biologists, or forming teams of biologists to do things using informatics, computational and data skills. If they are well trained, biologists can often draw on their background to deal with problems that involve chemistry, physics and other hard sciences. It’s a bigger stretch, as it turns out, for biologists to reach back into mathematics or computer science because they’ve never taken those courses as undergraduates or been exposed to them. I think that’s a big part of the challenge of computational biology.

Would you say this is different for actual bioinformaticians? Do they face different challenges?

Yes, possibly! I see bioinformatics as the interface between biosciences, especially the molecular biosciences, on one side and maths, stats, computer science and information technology on the other side. There are different parts to this interface, with different challenges.

What is open data, and what does it mean to you?

Data — both primary and processed research data — need to be well-annotated with the proper metadata, be accessible online maybe with certain sorts of controls, and be discoverable, shareable and reusable. So data that tick all of those boxes are open and would go hand in hand with data and computational infrastructure and some kinds of analytical tools – all of which need to be open and accessible as well.

What is currently missing in the field of bioinformatics and life sciences?

We’re getting there. Some areas where there are concerns about privacy or security progress more slowly than others, which is understandable. Other areas are more mature.

It is early days yet, but what would you like to see EMBL-ABR become, achieve?

I’ll give you two wishes – my personal wish is that my group of researchers can match up data sets that we’ve generated ourselves, or that we pull down from the web or from colleagues, with computational tools or workflows in or accessible easily through EMBL-ABR. We may need to do good computation over the data, or put the data to the computation, whichever is easier. As it is more often now computation to the data, we need to do this at scale.

More broadly, I think my biology colleagues want to be able to access at least the most important, common and critical tools up to some level without having to worry about what computer they are logged into. They want to log-in somewhere online, or on their own desktop, enter their Australian Access Federation ID and there it is: they have compute, they have data and it all works for them and they don’t need to mess around.

So is there indeed a specific lack of accessibility just for Australian researchers or is it generic?

No, I think it is generic. Actually Australia is arguably further along in some aspects of accessibility than are others. On the other hand there are different models, this is IT and IT is engineering and changes every few years anyway. Whether you want to distribute the data and computing, or make virtual machines or whatever, the technology is going to be different in two or three years.

Do you mean that you need as flexible an environment as possible?     

Yes, do the best thing at any particular time.

And you think EMBL-ABR could be instrumental in this?

Yes, in helping to define the best current technical solutions, so biologists can solve their research problems. I think there’s enough goodwill and money and eResearch people to make it happen.

_____________________________________________________________________

Biosketch: Mark Ragan is Professor and founding Head of the Division of Genomics & Computational Biology at the Institute for Molecular Bioscience, and Professor (Affiliate) in the School of Information Technology & Electrical Engineering, both at the University of Queensland in Brisbane, Australia. He was Director of the Australian Research Council (ARC) Centre of Excellence in Bioinformatics (2003-2016) and co-founder of QFAB Bioinformatics. Mark is a graduate of the University of Chicago (Biochemistry) and Dalhousie University (Biology). His 200 peer-reviewed research publications in biochemistry, molecular biology, evolutionary biology, genomics, algorithmics, bioinformatics and computational biology have attracted more than 10,000 citations (Google Scholar). Core technologies in his research group (integration of large bioscience data, scalable algorithms on trees and networks, bioinformatic workflows, machine learning, high-performance and data-centric computing) are applied to problems of lateral genetic exchange among bacteria, annotation and annotation of marine genomes, and inference of biomolecular networks in cancer. Mark is also involved in national and international infrastructure initiatives in genomics, computing, data and bioinformatics services.