Sandra Orchard, Molecular Interactions Team Leader, European Bioinformatics Institute (EMBL-EBI), European Molecular Biology Laboratory, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
What is bioinformatics for you and why does it matter?
Bioinformatics is a very broad term, and an equally broad definition would be that it is the skill sets, algorithms and underlying data that enable us to make sense of biological data, in particular the increasingly large datasets generated by the ‘Omics communities. It matters, because without this ability to understand these datasets, the time, effort and large amounts of money spent generating them would largely all be wasted.
What are the challenges you see for life scientists / medical researchers in the data driven science era?
I think the biggest challenge is always meeting the expectations generated at the start of any big project, particularly if the groundwork, in the shape of underlying, reference data resources is not already in place or if, which is often the case, the data analysis part of the project has not been given the same level of planning and forethought that the data generation and collection phases have. Factoring in the bioinformatics right from the initial stages of any workflow is increasingly important but is still often largely over-looked.
Would you say this is different for actual bioinformaticians? Do they face different challenges?
Bioinformaticians working on non-model organisms certainly face different challenges from those in the biomedical area, in that they largely have no reference resources with which to work. Any data they use as a comparator, or as the basis for an analysis, often needs to be inferred from the most closely related organism for which extensive data does exist, and this adds a whole new level of difficulty to any analysis.
What is open data, and what does it mean to you?
It means data that is freely available for the community to access and work with, but also data that is useful i.e. which is accompanied by adequate levels of well annotated meta-data, which adheres to community standards and which has either been deposited in a community recognised data repository, or held locally but accessible through a recognised consortium such as IMEx or proteomeXchange. Placing raw data onto GitHub or FigShare or even on a poorly advertised, local website may appear to tick the Open Data box, but in reality falls well short of community needs.
What is currently missing in the field of bioinformatics AND life sciences?
I guess full global collaboration – there are still many groups doing valuable work which is being lost because the rest of the world just doesn’t know it exists. Reaching out to global communities, or making use of global resources, is just so important in our field and it is money wasted if this does not happen.
It is early days yet, but what would you like to see EMBL-ABR become, achieve?
I would like to see EMBL-ABR become an actively contributing part of this much wider community, working with us to maximise the use of the resources being produced. Australia is geographically very distant from us, and have your own very distinct flora and fauna from which we can learn a lot, so we need your data, and the conclusions you draw from it, to be readily available and out there for us to access.
Biosketch: Sandra studied Biochemistry at the University of Liverpool and then worked as an enzymologist for Roche Products Ltd in the UK, leading a small team establishing drug screens and activity assays. She joined the European Bioinformatics Institute (EBI) in 2002, and now leads the Molecular Interactions team there with responsibility for both the IntAct database (www.ebi.ac.uk/intact) and the Complex Portal (www.ebi.ac.uk/intact/complex).
She has been actively involved with the work of the Molecular Interactions work group of the HUPO Proteomics Standards Initiative since the inaugural meeting in 2002 and has played a central role in both developing data exchange standards and their subsequent implementation in the IntAct environment. She played a key role in the initial establishment of the IMEx Consortium of protein interaction databases and has continues to support that, as the curation and data maintenance programs of the partner databases become increasingly more aligned. She has also been very active in the field of user training, both through hands-on workshops and online training.