Mr Jason Williams: Assistant Director, External Collaborations, at Cold Spring Harbor Laboratory’s DNA Learning Center and Education, Outreach, and Training lead for CyVerse – the U.S. National Cyber infrastructure for Life Science.
What is your involvement in bioinformatics?
Relative to other techniques (e.g. molecular biology) or disciplines (e.g. genetics), I’ve heard it said that bioinformatics is a young field. More specifically I’d say it’s adolescent – rapidly growing, slightly awkward, but open to a whole world of possibilities.
My role in bioinformatics is unusual because it involves education in a non-traditional sense. Most of the biological domain knowledge researchers or educators use in practice was acquired through formal education. But many who need to learn about and use bioinformatics (researchers, students, educators) may lack formal education or access to resources that will extend their informal/self-taught learning.
I’m able to spend a lot of time with people who are at the stage of ‘first exposure’ to the concepts, and tools of bioinformatics. There is a large and explicit niche of bioinformatics novices because so many biologists at all career stages find themselves just starting out with data analysis. Until biology education catches up to the idea of biology as data science, I can contribute to people gaining enough confidence in the tools and technical practices of bioinformatics to look beyond them into the biological meaning of data.
What are the challenges you see for life scientists in the data driven science era?
Life science is data science. But I think we may have to be open to a broader definition of data science. Any particular data scientist might be paid to focus on a very specific objective (e.g. mining search-engine results, election statistics, etc.). But biology is very diverse (from a doctor, to a field ecologist, to a geneticist), each may have a different need for data science. Their work may require working in the data science domain on a problem of searchability, metadata, visualisation, etc. How could we prepare every type of biologist to be competent to work with all of these things in addition to understanding the biology?
Since no one can be an expert in everything, I think a major task is to make challenges clear. Life Scientists (in general) need clear standards and uniform processes so that as they put their data scientists shoes on they have ‘guide books’ that demarcate what types of analyses are appropriate, how to evaluate the quality of data and analyses, and where pitfalls are located.
Would you say this is different for actual bioinformaticians? Do they face different challenges?
Bioinformaticians do have particular challenges, but probably not unique ones. Rather than think of ways to portray bioinformatics as some strange science (rather than a special case of many sciences before it), it may be more practical to highlight the just two areas that are significant impediments to progress. Bioinformaticians needs to ‘borrow’ more from non-biological sciences (they borrow from computer science, ecologists need to borrow statistical techniques, anatomists need to understand engineering). Bioinformatics makes the most progress as it continues to adopt the good computational practice that has been developed for some time by the others (algorithms, software development practices, etc.). Getting this interdisciplinary borrowing to work is related to my second area: the ‘people problems’ of bioinformatics. To whatever extent a bench biologist is a bioinformatician when she first starts analysing her own data, bioinformatics as a field needs practices that support new practitioners. These new practitioners may be novices to computation, but are experts about their biological system, and ultimately they are best positioned to generate insights from bioinformatics results. Another ‘people problem’ is the need for bioinformaticians to have clear career paths, especially for those practising bioinformatics as science in itself.
What is open data, and what does it mean to you?
Open data is me working on your laptop (or cloud, or HPC system). Data is not open just because it is sitting somewhere in NCBI, ENA, or Amazon Glacier. I need to be able to understand the significance/provenance/metadata, be able to analyse, and be able to collaborate with that data for it to be open. Openness for a single 500MB file this might be easy, and it gets exponentially harder to achieve for very large datasets. The good news is that we do have the technology to make data open, and when I feel like I am working on your laptop, the data is open.
What is currently missing in the field of bioinformatics AND life science?
This is difficult to answer without getting philosophical! It’s also difficult because it is and should be difficult to say where bioinformatics and life science begin and end. If I could pick something that is missing, it would be an approach to life science which includes bioinformatics (and its best practices) as part of one unified protocol. Ultimately, there needs to be a strong connection and mutual intelligibility between lab/field protocols and computational ones. Bioinformatics is life science and vice versa.
It is early days yet, but what would you like to see EMBL-ABR become, achieve?
I still probably need to learn more about all the parts involved, but some preliminary goals I think would include:
- satisfy bioinformatics needs that enable Australian scientists to do research that is competitive at the global level
- function and maintain a reputation within Australia as a national-scale resource for bioinformatics
- develop and promote Australia’s scientific ‘Natural Resources’ (Datasets, technologies, people) in a way that leads to global recognition/leadership
- identify resources, technologies, and collaborations that offset incidental barriers (e.g. geographical distances) and capitalise on national priorities (e.g. Government funding opportunities).
Biosketch: Jason is Assistant Director, External Collaborations, at Cold Spring Harbor Laboratory’s DNA Learning Center where he works to spread hands-on biology education internationally. As Education, Outreach, and Training lead for CyVerse – the U.S. National Cyber infrastructure for Life Science – Jason provides training and support to scientists and educators, helping them leverage the most advanced tools and best methods for research and education in data-intensive biology. Jason organises, instructs, and speaks at more than dozen bioinformatics workshops and conferences annually. Additionally, he serves on several committees and boards for projects that advance science and science education including his service on the Steering Committee of the Software Carpentry Foundation (Chair, 2016), and as an instructor for Software and Data Carpentry – organisations that centre around scientists teaching scientists computational best practice.