How did you get into bioinformatics?
I kind of rolled into it. I started my PhD in wet lab. When there was a presentation in our lab on small RNAs my supervisor suggested to port this to plants, so we started doing a project on sequencing. This generated a lot of data which had to be analysed. Therefore I started developing data analysis scripts, and before long I was only doing data analysis. As a postdoc I focused on NGS analysis, transcriptomics more specifically, and decided not to do wet lab experiments anymore and just go for bioinformatics.
What are the challenges you see for life scientists in the data driven era?
I think the current generation of life scientists are often not able to grasp what it means to do data analysis, because they have not been trained to do that. My position in the lab is to enable researchers to do data analysis, and I notice that there is a lack of feeling what data analysis entails. They are familiar with and being trained in the proper way of performing wet lab experiments. For data analysis, this knowledge is not yet always up to the same high standards. In the end you want to assess “is this result something I can trust, and is this valuable for me?”, but this can be very difficult in life sciences to assess. So I think conveying how important that is, and providing the training and tools so researchers can actually do it, is crucial for the future.
Would you say this is different for bioinformaticians – do they face different challenges?
For bioinformaticians I expect that they know about data analysis. Their main challenge is in making the link to the biology: actually trying to tackle biological problems. I think this is the main thing – bioinformatics can stay too much on a technical level and we need to inject more biology. The only way to do that is to have the dialogue between the two disciplines. That is still proving to be very difficult. It will require effort from both sides, biologists and bioinformaticians, but progress is being made.
What is open data, and what does it mean to you?
I think open data is about making it easy for people to reuse your data for validation purposes or for completely different purposes that nobody yet thought of. And I think the most important thing for open data is having the annotation. The focus now is on getting data into a repository – which is great, because you also need that – but you need to be able to tell what the data is, in a very structured way, through the metadata. I think this is the most valuable aspect of open data and we should put more emphasis on that. It’s a very hard question to tackle because nobody’s used to spending time on actually making sure data is annotated well, annotated consistently, so that you don’t have to go through manually to make sure that all the definitions are accurate and useful.
What is currently missing in the field of bioinformatics and life sciences?
As I’m involved in ELIXIR, I underscribe the route of trying to have all these different initiatives talk to each other: not only the people, but also the computers. To make them interoperable so that you can efficiently re-use things. It’s often said that bioinformaticians spend 80% of their time converting data formats, which has an aspect of truth to it: if we can reduce that time, then we can spend it doing interesting biological things rather than building up scripts for ‘housekeeping’. That would lead to meaningful advances in the field, producing novel added value for projects. So yes, I strongly believe this is something we need to advance.
As data is being generated in massive amounts we have some catching up to do, but it’s still feasible; in a couple of years it will become very difficult, so we need to have some infrastructure in place by then. We are tackling the same challenges in our Institute. Basically we haven’t yet caught up with the advances in technology and the amount of data that we have. We’re not used to handling these massive amounts of data in diverse and complex datasets. So we need to learn how to manage that, which takes time and effort. This is also very difficult to fund from a research point of view and so any initiatives where we can learn from each other and copy solutions to this problem are very valuable because they make the lives of the biologists and the bioinformaticians easier.
It is early days, but what would you like to see EMBL-ABR become/achieve?
I think it would be nice for the rest of the world to know more about what you are doing in Australia. I know about the GVL, for example, but I think it would be very beneficial for us to talk more, to be aware of what people are doing, to learn from each other, to adopt solutions and suggest improvements so that we can try to avoid duplicating effort. We need to duplicate some things in terms of infrastructure and hardware, but there are other things where we can just say, “Well, you did a great job, let’s try to implement that over here”. Then maybe we can link them up and achieve added value. So reaching out is one of the things I’d like to see, to learn more about what you do.
Biosketch: As a bioscience engineer, I started a PhD studying leaf development in Arabidopsis thaliana. During my PhD I gradually shifted from the wet lab to bioinformatics. I mostly worked with expression data: from qPCR, over microarrays to sequencing. As a post-doc, I continued in bioinformatics and set up an RNA-seq workflow in our department. In my current role as Project Leader, Applied Bioinformatics and Biostatistics, Department of Plant Systems Biology, Ghent, Belgium I coordinate the different projects and provide consultancy for bioinformatics. Recently, I’m also managing IT at our department, aiming to even better align informatics and bioinformatics. We are setting up a Belgian ELIXIR node, in which I’m Technical Coordinator.