Suzanna Lewis, Scientist and Principal Investigator at the Berkeley Bioinformatics Open-source Project based at Lawrence Berkeley National Laboratory.
How did you get into bioinformatics?
Well it didn’t exist. I was doing my graduate work in microbiology with lambda phage and took a course at the University of Michigan in systems biology. This involved looking at metabolic pathways and stability and whether or not they would go into oscillation. It was really cool. It also involved some programming so I signed up for a programming class. At that point microcomputers were just becoming more mainstream and so I quickly got my Masters in microbiology and then switched to computer science. I picked up a degree in real time systems engineering because I was still interested in systems. For many years I worked in robotics and industrial control systems like cranes and rolling mills. Then in the late 80’s the genome projects were funded and there were very few people like myself who knew both biology and could also write code, and that’s how I got back into it.
What are the challenges you see for life scientists in the data driven era?
One question is the data itself. There is a lot of it produced but finding it is a challenge. It’s so trivial for people to publish, you just put up an FTP site or drop it into an archive and it’s there, but making it discoverable is hard. And making it comparable is even harder. It’s difficult to know if two people are even talking about the same entities. Furthermore, typically much of the metadata is missing; what’s been used to classify it and describe it. All those characteristics that are needed to put the data into context, which is essential for being able to reuse that piece of data. Finding it, putting it into context, and then I think the last thing is that as a scientist you collect all this data, you use it, and then you’re going to publish your results. First you form your own hypothesis, for example a large scale comparative analysis and maybe you discover something revealing like – there’s some causative factor no one has ever thought about. So you want to publish, and you should get credit for what you publish. Right now so much of publishing revolves around text – the analogy I always use is the food processor – if you take bananas and strawberries and other good things and put them in the food processor, you get a smoothie and it’s lovely. Similarly we take all our experimental research results and conclusions and we put them in a word processor and get a lovely PDF. But it’s just as hard getting knowledge in a computable form out of that PDF, as it is as getting the original banana out of the smoothie. We need new means of publishing where you get credit for the fact that someone can easily computationally find and re-incorporate or integrate your information.
So do you think bioinformaticians are facing different challenges than life scientists?
No, it’s very hard to tell the difference anymore. I suppose there is some theoretical methodological work that is not strictly life science, but I think anyone who is doing bioinformatics needs to be a life scientist as well and actually understand the fundamental questions being asked. At least we don’t need to know how to pour gels anymore, well at least I don’t! But it’s the same research, people used to have to isolate their own restriction enzymes but now no longer need to do this themselves, now we are using bioinformatics as an essential tool for life scientists.
What is open data and what does it mean to you?
I guess it comes back to the fact that it’s findable, it’s accessible, it’s reproducible and interoperable. FAIR! Interoperable to me means you can place that data in any context of any other life sciences data that’s out there, which means that you really have to have standards when describing it.
What do you think is currently missing in the field of bioinformatics/ life science?
It’s always a moving front where things keep filling in. You think it’s a hole but as soon as you think it’s a hole someone’s there to fill the gap. It’s always chaotic. I would like things to be nice and tidy but it’s inherently chaotic as people figure out the best way to do things and then resolve it.
Perhaps the most difficult thing is the whole publishing paradigm. I think it needs to shift from the model that started back in 1655 (Henry Oldenburg) to publishing in a computable form. The biodiversity community seems to be closing in on something workable where publishing is incorporated in each point in the research cycle, rather than waiting to write up your paper at then end of the project, and then maybe add some metadata. Publishing should be in your mind when you create your experimental design. If this is done in a structured way that is making it publishable even as you design your experiments. I do think the medium for exchanging information is and will always include natural language because it is nuanced and highly expressive, but it should not be the only means of conveying information. More and more our published results have to be readable by computers. People complain about the bottleneck of curation and the need for bio-curation, but if we shift structuring of information further upstream to earlier stages in the research cycle then the jobs of the curators will be easier. Their job will then be one of integrating multiple sources of information, rather than disentangling results out of an individual PDF file. Curators would then act more as reviewers and synthesisers of information.
What would you like to see EMBL-ABR become or achieve?
I would love to see EMBL-ABR engage the broader community in science. Maybe interacting with other groups in Australia who are doing high school biology extra courses, and community engagement as well.
Suzanna Lewis has Master of Science degrees in Biology and Computer science. She is a scientist and Principal Investigator at the Berkeley Bioinformatics Open-source Project (BBOP) based at Lawrence Berkeley National Laboratory. Suzanna leads the development of open standards and software for genome annotation and ontologies.
She led the team responsible for the systematic annotation of the Drosophila melanogaster genome, which included development of the Gadfly annotation pipeline and database framework, and the annotation curation tool Apollo. Suzanna’s work in genome annotation also includes playing instrumental roles in the GASP community assessment exercises to evaluate the state of the art in genome annotation, development of the Gbrowse genome browser, and the data coordination center for the modENCODE project. In addition to her work in genome annotation, Suzanna has been a leader in the development of Open Biomedical Ontologies, National Center for Biomedical Ontology. She, along with Michael Ashburner, founded the Gene Ontology, and instigated and continues to contribute to the Sequence Ontology, Uberon anatomy, and other ontologies, as well as developing open software for editing and navigating ontologies such as AmiGO, OBO-Edit and Phenote.
In 2005 Suzanna was elected a fellow of the American Association for the Advancement of Science in recognition of her contributions to science in the fields of Information, Computing, and Communication.