Allegra Via, physicist and scientific researcher at the Institute of Biomembranes and Bioenergetics, National Research Council, Italy and the ELIXIR Italy Training Coordinator.
How did you get into bioinformatics?
After graduating in Theoretical Physics, I felt the need for something more “concrete”. I was attracted by biophysics and, wandering around, had the chance to read a few papers on protein folding, structure prediction and analysis, and I literally fell in love with the structure, function and evolution of proteins. While trying to understand how I could contribute to the field being a physicist, I was introduced to one of the very few bioinformaticians at the time in Rome who became later my PhD supervisor. It was about 20 years ago and it took me a while to realise that I was actually getting into bioinformatics.
What are the challenges you see for life scientists in the data driven science era?
In the data driven science era, life scientists need to acquire a much wider range of competencies than their predecessors. Together with new experimental techniques to produce and process data, they need to become knowledgeable of in silico data management and analysis. For example, they cannot ignore how to use statistics to extract meaningful information from data. And even when they are not going to be the ones who will make the computational work, they must be aware of the power and the limits of bioinformatics approaches and resources. Moreover, they need to acquire enough knowledge to formulate appropriate requests to computational biologists and data scientists. This implies not only they have to get a new set of skills, but they also need to develop a shared language and a way of thinking with both the data analyst and the software developer.
These are the reasons why we have more and more bench biologists in our bioinformatics and even programming training courses: participants do not mean to become bioinformaticians; they rather want to learn what kind of information current technologies can extract from their experiments, what data is already available and how they can access and analyse these. They also want to know what they can do by themselves and which analyses require an expert (e.g. a statistician).
Would you say this is different for actual bioinformaticians? Do they face different challenges?
Bionformaticians too need to acquire new skills and abilities. They have to learn how to make use of resources such as high-end computing (historically mostly exploited by high energy physicists), the Cloud, and e-Infrastructures in general; they need to become proficient in statistics, and acquire good practices in software development and versioning in order to ensure reproducibility.
Bioinformaticians, since the advent of the data-driven science era, have to adopt data-analytical thinking, understand principles of causal analysis and be able to develop effective methods for data analysis, visualisation, representation, and communication.
Furthermore, they face critical issues related to the extraction of actual knowledge from data, such as big data storage and management, information privacy and security, data standards, data integration and sharing, data annotation, data mining, etc.
This effort to acquire computational science and statistics skills, should NOT be done at the expense of their knowledge in life sciences: in the data driven science era more than ever, the bioinformatician needs to have a robust knowledge of biology and be aware of what are the most urgent questions in biomedical research.
What is open data, and what does it mean to you?
Open data is the ideal condition in an ideal world. It goes together with open science, open source, open access and, ultimately, open minds. I like to think that it represents the only possible future, especially if we don’t want to shrivel up in the long term. Although it is hard to see how it is to be funded and sustained, I observe an increasing tendency and interest in open data and do advocate for it. I believe that “openness” in science (and beyond) is a theme that should be introduced and discussed in high schools and undergraduate courses.
What is currently missing in the field of bioinformatics AND life sciences?
Data driven science requires new skills and demands for new multidisciplinary professions. The US is expecting to experience a shortage of 190,000 skilled data scientists in the near future. In Europe, there are estimates suggesting that 500,000 new data scientists will be needed in the coming five years. The data scientist can be defined as “an expert who is capable both to extract meaningful value from the data collected and also manage the whole lifecycle of Data, including supporting Scientific Data e-Infrastructures”. Based on this definition from the EDISON project and despite similar initiatives such as CODATA and RDA, appropriate data science curricula are missing both in life sciences and bioinformatics undergraduate courses.
It is early days yet, but what would you like to see EMBL-ABR become, achieve?
EMBL-ABR may have a relevant role in guiding the changes needed in the data driven science era and in addressing the challenges posed by Big Data, including, among others, the issues of data quality, data integration, data understanding, and visualisation. For example, since only a small percentage of the available data is actually analysed, EMBL-ABR could support projects making use of available raw information rather than generating more data, and endorse collaborative more initiatives between life scientists and bioinformaticians.
Another important aspect, which I believe EMBL-ABR should be responsible for, is training the new generation of data scientists by filling the gaps left open by university systems, which will need a longer time to adapt to the data revolution.
Even more importantly, EMBL-ABR may have a fundamental role in developing a roadmap to achieve a high degree of competitiveness for Australian education and training on bioinformatics and data science.
Biosketch: Allegra Via is a physicist and scientific researcher at the Institute of Biomembranes and Bioenergetics (IBBE) of the National Research Council (CNR, Bari, IT). Allegra obtained her PhD in 2003 at the University of Rome “Tor Vergata”, where she also worked six years as postdoc. In 2009 she moved to the Sapienza University as a researcher, and since January 2016, she commenced full time in her current role as the ELIXIR Italy Training Coordinator. She is involved in the design, organisation and delivery of bioinformatics training courses, in Train the Trainer activities, and collaborates with other ELIXIR’s nodes on many training-related initiatives. She has a long track record of academic teaching (Macromolecular Structures, Python programming, Bioinformatics, Biochemistry, Protein interactions). Her main research interests include protein structural bioinformatics, protein function prediction and analysis, and protein interactions. She is a member of the Global Organisation of Bioinformatics Learning, Education and Training (GOBLET) and a Software/Data Carpentry Instructor.