Community Survey Report – 2013
In early 2013, a survey of Australian life scientists was undertaken to identify areas in which this resource, EMBL-ABR, could support researchers to make optimal use of bioinformatics capabilities. More than 200 responses were received from across Australia, representing 750 researchers from all areas of biology.
Overall the picture was of ubiquitous use of the tools and data of bioinformatics, with a clear indication that it was no longer in the hands of specialist bioinformaticians but widely used by laboratory scientists.
Lack of expertise was identified as the single biggest difficulty facing researchers in their bioinformatics activities, and training as the most valuable thing that EMBL-ABR could do to support those activities. Dry-lab researchers also highlighted a need for better bioinformatics community networks.
Key conclusions of the survey were:
- Bioinformatics is a key activity in Australian research as evidenced not only by the content of responses but also simply by the number of responses
- The areas of interest reflect the “central dogma” of molecular biology
- Not only bioinformaticians but also laboratory scientists see bioinformatics as core to their work
- Geographic location imposes significant but not crippling limitations on exploitation of bioinformatics
- Users are more likely to report satisfactory service (hardware, software and support) if it is provided within their own group
- There is a very marked concern about lack of expertise and access to expertise in bioinformatics
- Training and community building are the most sought after services
- There is a significant demand for training of a more general nature, in computer programming and statistics
The survey ran throughout February 2013, during which time it was advertised as widely as possible through mailing lists, professional networks, social media, conferences, seminars and websites. Responses were collated and analysed at the end of the month, although the survey has remained active since then to allow a continued opportunity for members of the community to provide input. The survey consisted of a mixture of multiple choice and free text responses, all of which were optional.
210 responses were received by the initial analysis date, representing the views of a self-reported 750 people. Responses came from the ACT and all Australian states except the Northern Territory, with the majority coming from Victoria, Queensland and NSW (together just over 75% of responses).
Figure 1 – Distribution of survey respondents by state and organisation type
Respondents were evenly split between wet-lab and dry-lab researchers, with a small number in scientific support, and were largely from universities and other academic research institutes. Just under half described themselves as ‘researchers’ and about a quarter as ‘students’, with ‘principle investigators’ as the third largest group. Survey respondents were active across practically all fields of biology, with the most common areas being bioinformatics research, genomics, molecular biology, cell biology and genetics.
We are optimistic that the broad distribution of demographics suggests that the sampled population is relatively representative of the Australian life sciences research community as a whole. However, it should be recognised that the nature of advertising the survey means that the number of people aware of it is unknown, and also that those completing the survey were a self-selected population.
The number of responses alone is also indicative of the value Australian scientists place on bioinformatics as a research tool. A similar survey in Europe garnered a little over four times as many responses in total from a community perhaps fifteen times larger than that in Australia.
Use of bioinformatics resources
The vast majority of survey respondents were existing users of bioinformatics tools, with 85% of wet-lab scientists, and essentially all dry-lab scientists, reporting using bioinformatics tools at least occasionally. Over 90% of researchers using bioinformatics tools (both wet- and dry-lab) also made use of remote databases.
Figure 2 – Frequency of bioinformatics usage for wet- and dry-lab researchers
Database usage reflected the research area distribution of respondents, with gene, genome and expression data being the most popular, and protein resources less so. Similarly, GenBank was the most popular bioinformatics resource, with Ensembl and the UCSC browser also frequently used. Pathway and interaction databases were popular resources, highlighting the importance of systems biology approaches to many researchers.
Figure 3 – Usefulness of database types, normalised to percentage of total responses
In addition to their use of public databases, respondents also reported generating significant quantities of their own data – 53% produced between 1Gb and 1Tb data each month, with 25% producing more and 22% less. These figures were similar for both wet and dry-lab researchers.
Access to infrastructure resources
Respondents were generally satisfied with their access to software, databases and high-performance computing resources, with over 75% describing their level of support in these areas as adequate. In contrast, fewer than 50% described bioinformatics support staff levels as adequate. There was also a substantial minority of people who wanted bioinformatics support, but felt that they had no available access from in-house, collaborative, or even external sources.
Figure 4 – Level of satisfaction with bioinformatics infrastructure resource types related to the location of those resources. Blue indicates adequate, red is inadequate
Disadvantaged by geography
The survey explored if there was a perceived disadvantage to researchers caused by their physical isolation from the major bioinformatics resources available to North American and European scientists, in particular database access. The survey results supported this premise, with one third of respondents reporting that their access to data was disadvantaged by geography. Access to IT resources and bioinformatics expertise was considered more affected (40% and 54% respectively reporting some level of disadvantage).
Views were comparable across most states other than Western Australia, where location was felt to be much more of a disadvantage for bioinformatics access. This may reflect the comparative isolation of WA even within Australia, or it may be an artefact caused by the smaller number of responses from that state
Figure 5 – Level of research disadvantage experienced as a result of geographical isolation from biological databases, IT resources, and bioinformatics expertise
The most emphatic outcome of the survey was the overwhelming demand for bioinformatics training and concern about lack of bioinformatics expertise within the Australian life science community. Only four (<2%) respondents to the survey indicated that training would be not at all useful to them, with 75% of the remainder identifying at least one area of training that would be ‘very useful’. Statistical analysis training was the most requested, with 95% of respondents identifying it as somewhat or very useful. Next-generation sequencing and network/pathway analysis were other popular areas for training.
Figure 6 – Level of interest for training in different bioinformatics areas, normalised to percentage of total responses
Tools and databases
Respondents were asked to list up to five of their most important bioinformatics tools or databases, and on average they named about two each. In total 165 different tools were identified, although only about one third (57) were mentioned more than once. Despite having been initially developed over twenty years ago, Blast remains one of the most popular bioinformatics tool, second only to the UCSC genome browser. The statistical program R was also highly mentioned, reiterating the recognition that bioinformatics, and indeed life science generally, should be based on a strong statistical foundation.
Figure 7 – The most popular bioinformatics tools and resources – only those listed by four or more respondents are included
Most important issues
Finally, respondents were asked what their biggest single difficulty was in bioinformatics, and what would be the most useful thing that EMBL-ABR could offer them. Analysis of this question relied on a semi-subjective interpretation of free-text responses, but overall the main problem described was a lack of expertise in bioinformatics (40% of all responses) while the most useful thing was, by a long way, to offer training (50% of all responses).
Figure 8 – What is your biggest bioinformatics difficulty (blue chart on left) and what is the most useful thing that EMBL-ABR could do for you (red chart on right). Semi-subjective categorisation of free-text responses normalised to percentage of total responses
A breakdown by wet- and dry-lab researchers identified a third area of concern to the latter group (who are most likely to be full-time bioinformaticians), namely that they also felt a need for a better bioinformatics community and network.
Figure 9 – The three main areas of bioinformatics needs in Australia separated by researcher type
While training was clearly considered to be the most beneficial activity, we recognised that it is possible this may in fact just be a symptom of the current lack of expertise and access to support. Other activities such as providing easier access to bioinformatics support capabilities, or developing simpler bioinformatics tools and pre-prepared analysis pipelines, may provide an alternative solution without all biologists necessarily having to train in bioinformatics.