What is bioinformatics for you and why is it important?
Bioinformatics for me is essentially any activity in which electronic and/or numerical representations of biological data are core. It matters for many reasons but two really stand out. The most obvious is that it takes biology to new levels, enabling increases in both the quantity and quality of data that can be thrown at a given scientific question. As a result, we can ask questions that were impossible a few years ago, and continue to gain new insights from old data. The less obvious benefit of bioinformatics is that it encourages precision, clarity of thought and accurate record keeping, which is the foundation of reproducible science.
What are the challenges you see for life scientists / medical researchers in the data driven science era?
The current generation of researchers face clear challenges in data handling and integration. I am confident that these will be resolved by changes in practice and improved data management protocols. I think a bigger challenge is knowing what to do with all the data once it can be handled and integrated. The size of modern datasets and frequent involvement of “black boxes” often increases the detachment from, or abstraction of, the biological system being analysed. When combined with the rate at which new methods are developed – and the challenges maintaining documentation – this could lead to confusion, application of inappropriate methods, and/or unjustified conclusions.
Would you say this is different for actual bioinformaticians? Do they face different challenges?
Bioinformaticians definitely face different challenges. One unfortunate challenge is the continual struggle to get people to treat bioinformatics like “wet” disciplines. The relative ease, safety and low cost of attempting bioinformatics analysis means that people often charge in without the same degree of consideration and planning that is applied in the laboratory. Bioinformatics is just like bench science in many ways, even though our “samples” are data and our “equipment” is software. The pace of development also creates different challenges for those developing methods. Software can become redundant much faster than data, which presents real risks if committing to the time and effort required to publish it. I’ve been involved in several collaborations where non-trivial amounts of bespoke bioinformatics – essential at the time – have become obsolete or redundant during the life of the project. As a result, I think a lot of the work of bioinformaticians goes unnoticed and unrewarded. Ironically, the better you are, the quicker you can often do things, and the easier it looks to the outsider!
What is open data, and what does it mean to you?
Open data is the data equivalent of open source software, where data is freely accessible for anyone to use. As a researcher, this means that more data is available for analysis. As an educator, it means that students have easy access to real data. As a bioinformatic, it means that more data is explicitly made available with sharing in mind, which (hopefully!) increases the quality and quantity of documentation and metadata attached to the data.
What is currently missing in the field of bioinformatics AND life sciences?
I don’t know if anything is completely missing but I think we currently fall short in terms of: (1) community standards for both data and software; (2) adequate metadata for capturing nuances of data quality; (3) adequate bench-marking of algorithms and tools. I do worry that quality is often sacrificed under the pressure to publish so quickly and so often. The Herculean tasks of curation, testing and bench-marking are not given the attention and credit they deserve.
What do you see are the priorities when it comes to bioinformatics for researchers working in Australia?
Sharing expertise and planning for the technologies and data of the future. The rates of technological/methodological development and growth of the scientific literature make it extremely challenging to stay on top of things. This is exacerbated by the growth of interdisciplinary science in which bioinformaticians are often expected to understand data and methods from many diverse fields.
You have developed SLiMSuite, which you continue to maintain and offer as a free service to the bioscience community, how can EMBL-ABR help with such efforts and support further activities and tool development in a scalable and long term manner?
People who write bioinformatics software often find themselves in a catch-22. It is generally impossible to secure either funding or institutional support for the development and maintenance of a tool unless you can demonstrate a strong user base – but it is generally impossible to establish a strong user base for a tool without dedicated support for its development. This is especially true when you lack formal computer science training and are trying to learn software engineering on the side whilst holding down a day job as an academic, which is not renowned for its light workload. EMBL-ABR might be able to help by forming/supporting a community of friendly bioinformaticians who are passionate about having high quality, well documented, well bench-marked, open access bioinformatics software. This partly comes back to standards again. Even having explicit criteria for “good software” to aim at and some mechanism for recognising/rewarding compliance would probably be helpful. It seems to have worked well for Bioconductor, which has a bunch of badges for each tool that inform the user of various attributes. This helps users when assessing different tools but it is also good motivation for developers to keep their tools maintained. Bioconductor also provides a consistent framework for documentation, clear guidelines (with minimum requirements) for submitted packages, and mailing list support for developers who lack experience or knowledge. I am not aware of an equivalent for standalone bioinformatics software but I think it would be a good idea. Anything that reduces the effort and increases the perceived reward, however small, is likely to aid motivation.
How do you measure impact when it comes to services such as SLiMSuite?
For me, the impact is whether people are able to do their science better because of the tool. Publication citations are probably still the main currency in this respect, but that is clearly not so useful for the unpublished tools. I’d love to track downloads and usage statistics etc. but I have a bunch of updates and documentation fixes to make before I can even consider working out how to do that!
It is early days yet, but what would you like to see EMBL-ABR become, achieve?
I think it is exciting to have another voice championing the need for a strong bioinformatics infrastructure that benefits bioinformatics developers, scientists and users across Australia. Bioinformatics thrives on open source code and open data. As with most public goods, these need philanthropic or government support to survive. With its strong international links, EMBL-ABR can also play a lead role in sharing and advocating best practice for a range of bioinformatics activities.
Biosketch: Rich Edwards has always been fascinated by evolution and its implications for understanding how life works. During his PhD in genetics, Rich supplemented his lab work with another childhood hobby – computer programming – to model evolving populations and analyse biological sequence data. He became a full-time bioinformatician in 2001 and has collaborated on projects involving numerous organisms and technologies. Whilst never setting out to develop bioinformatics algorithms, Rich has created a number of tools by necessity over the years, including the first ancestral sequence prediction tool that handled indels (GASP) and the first protein motif prediction tool that statistically adjusted for the evolutionary relationships of the proteins (SLiMFinder), coining the term SLiM (short linear motif) along the way. Rich is a big believer in open source software and, over time, these tools grew into the SLiMSuite package of sequence analysis tools in Python. After a decade or so focusing on protein sequence analysis, Rich has returned to his first love of genetics following the rise of long-read sequencing technologies, with a particular focus on PacBio yeast genomics.