Rochelle E. Tractenberg, Director, Collaborative for Research on Outcomes and -Metrics and Associate Professor, Neurology with secondary appointments in Biostatistics, Bioinformatics & Biomathematics and Rehabilitation Medicine, Georgetown University Medical Center, Washington, D.C.
Open science: what is it for you and why does it matter?
To me, science that is ‘open’ has three key features:
- it is accessible to every interested party: no privilege (e.g., subscription) needed to find and obtain excellent work
- it is accessible to different disciplines: rather than limiting exposure and accessibility of the concepts and designs to just those in our field who may need the work, it should be conceptually accessible to – and contextualised with – work from others in and outside a specific discipline
- it is fully transparent, rigorous, and reproducible.
These properties of science are essential for the integrity of each discipline, the integrity of each scientist, and the integrity of decisions that are made about or based on rigorous and reproducible work.
I have long been an active proponent of research integrity, which of course, most scientists are! However, most scientists – at least in the US – are only able to comply with requirements imposed by federal funding bodies and public research/educational institutions. That is, when many, if not most, scientists in the US think of ‘research integrity’, they tend to think in terms of whether they or their students/staff have completed the necessary ‘training’ in ‘responsible conduct of research’. In the US, this training can be from three hours to a semester; it typically focuses on information that is about laws, and not ethical behaviours; and, there may be some discussion component to the training, but this is never assessed. Further, the same training must be repeated periodically – every single participant does the exact same thing, performing and participating at the same level, whether they are Deans or undergraduates. One difference is that, at some ill-defined point (typically once a person has their own grant funding), instead of attending training to comply with regulations, a scientist can be considered compliant if they provide this training. No-one is ever prepared to develop or deliver this training. The upshot of this training paradigm is that it does not go very far, it does not equip scientists very well, and it has absolutely no potential for growth of a crucial skill-set, nor for evidence-informed improvement of training that might actually grow that crucial skill-set. I feel so strongly about this that over the past five years I have been a very active proponent of research integrity, arguing this case in several talks and publications, and obtaining grants to further my studies on this topic.
Although some might classify this work as ‘just ethics’, it is highly relevant for the construct of ‘open science’: one definition of science that is ‘open’ it that it is simply ‘accessible’ – one need not be in a privileged position to access that science. In my view, publishing in one’s own field without consideration of similar, or relevant, work in other domains leads to a lot of duplicated effort and fails to take advantage of longstanding lines of reasoning and evidence that could propel science forward, rather than requiring the entirety of that empirical work to be reproduced. Doing multi-disciplinary science, bringing research methods from education and psychometrics into studies and clinical trials for aging and Alzheimer’s disease, I have direct experience of how this broader definition openness strengthens science.
Finally, I feel that what makes science ‘open’ is not just its accessibility, but also its transparency: being reproducible and rigorous to then benefit every scientist and science itself. The more trans- and inter-disciplinary the work that a modern scientist engages in, the more essential it is to be transparent – open – and widely available. Policies and resource allocation decisions are often made based on ‘the best available science’; when this is unavailable, or too discipline-specific, or is simply not the best because it is not reproducible and/or rigorous, both science and society suffers.
Open science and bioinformatics: is there a link?
Bioinformatics and ‘open science’ are absolutely linked. As a statistician, my perception of bioinformatics is that of an ‘outsider’; that said, my perception of how bioinformatics as a field sees ‘open science’ is specifically to make resources available. This interpretation only addresses one of the three features of what makes science ‘open’ in my opinion; so, more needs to be done to make the products of bioinformatics work (not just the resources to apply it) relevant and intelligible to other scientists. Much more needs to be done to make sure that all scientists who use bioinformatics tools, resources, or constructs, do so with as much rigour, reproducibility, and transparency as possible.
Over the past 20 years as a statistician, I have worked mainly in clinical trials (for Alzheimer’s disease), designing/analysing them or developing methods or outcomes that can improve study design and/or interpretability. During my first 10 years as a statistician, I worked to develop methods to enable anyone who seeks to study the aging brain, or interventions that might limit the effects of aging on the brain, which I published in not-open journals (because there weren’t any real options at that time). I was only able to demonstrate the utility of these methods because I was working at the data coordinating center; it was difficult to obtain data from ‘outside’ this system and any investigator who did not feel they had exhausted the work to be published from ‘their’ project’s data could limit access to, or publications involving, their data. In 2011 I was the editor at the Public Library of Science Journal PLOS ONE handling a paper that analysed the likelihood of compliance with the signed agreement with the publisher to share data as a function of strength of statistical analysis in those papers. This was the second article by this group – and they found (both times) that sharing data was associated with ‘good statistics’ and refusals to acknowledge requests or share data were associated with errors, particularly those that would challenge the results and conclusions. These authors concluded that mandatory archives of data would address this kind of abrogation of a signed agreement to share. In one of my editorials on that paper I suggested that changing the culture of science (towards rigour and reproducibility), rather than mandatory archives (towards openness), would be more helpful. I also pointed out in a second editorial on the same paper that:
The responsible conduct of research includes conscientious protection of human subjects. In the book, Scientific Integrity, Macrina also notes, ‘(s)haring research materials published in the peer reviewed literature has been a traditional practice that follows from the expectation that scientific research must be amenable to replication’ (p. 81) … As such, these should be just as conscientiously executed as other aspects of the research enterprise. Refusals to share data harms science and constitutes violations of our obligations to be responsible conductors of research.
The 2011 paper that prompted these editorials related to articles in the field of psychology; results analysed in this 2011 paper came from over 1,000 articles published in two flagship journals over one year. If 73% of these results (as was argued) were not reproducible or rigorous, there might be important delays or misdirection in both clinical and scientific progress.
I mentioned that I have somewhat of an outsider’s perspective on bioinformatics, but I recognise important parallels between bioinformatics and statistics. In addition to being a statistician, I have also taught Introductory Biostatistics and provided scientific as well as statistical reviews for grants and journals. I see the challenges that statistics as a discipline has faced; including misuse or misreporting (or both) of methods that result in 73% irreproducibility as in the 2011 example. Another example of this challenge is the limitation to one 3-hour statistics lecture in a two-semester sequence that is intended to support medical students’ ability to use appropriate evidence to practice ‘evidence-based medicine’. From my perspective, bioinformatics could also face challenges like these, which is one reason for my confidence about what more needs to be done to ensure that those who are not bioinformaticians, but who use these resources, tools, and concepts, do so correctly and transparently. This has not universally been the case for statistics, and I have seen many grant proposals funded where the statistical plan is non-existent or exists but cannot possibly support the research goals. I have also reviewed papers where the results had inappropriate and/or incorrectly interpreted statistics, rendering the publication (over my objection) entirely irreproducible (and technically, neither rigorous nor transparent). I have objected to the restriction of teaching and learning around statistics to ‘just the course’ for reasons stated above. If bioinformatics as a discipline addresses these known issues and promotes the most open science about bioinformatics or employing tools and techniques from bioinformatics, then the discipline will benefit.
In short, there is a link between bioinformatics and open science – it should be nurtured, reinforced, and committed to, and never assumed.
What makes sense to resource as a national effort when it comes to open science versus local resourcing/support?
This question relates to (and strengthens) the impression that none of the three features of ‘open’ that I articulated in the first answer is really part of how the bioinformatics discipline defines ‘open’. If bioinformatics wants to be a discipline that truly embraces, or embodies, ‘open science’, then all necessary support for fully transparent work that is rigorous and reproducible can come from national as well as local resourcing –and should come from both directions. However, just making data, tools, or knowledge available nationally or locally as a resource will neither further nor strengthen science or scientists if the entire enterprise is not also specifically open in terms of transparency, rigour and reproducibility of the work that arises out of those nationally/locally available resources. Making it a national priority to ensure that all work using ‘bioinformatics resources’ is rigorous and reproducible itself; ensuring that life scientists do not simply view/learn to view the bioinformatician and other resources as ‘tools’ but rather, treat them as partners in the scientific enterprise, and respect the contributions that these partners bring to that enterprise, would be very sensible.
Rigour and reproducibility should take precedence over questionable innovation, insistence on ‘positive results’, and ‘grant success’, and this could be a national as well as a local resource issue. Funding, publication, and dissemination of science that is not reproducible, not rigorous, not transparent, or some combination of these is also harming science as well as the relationship between science and the public. Thus, these should be priorities as both national and local efforts.
How would you recommend a novice biologist approach open science and where would they find guidance, resources and tools for getting on board, particularly if they are not in the US…?
Given my definition of open science as being accessible and transparent, the novice biologist can approach open science by ensuring that every method and result they produce are as rigorous and reproducible as possible. Publishing in open science journals can cost more money than new scientists have, so making sure that their home institution offers the work on an archive or repository can offset the cost to access that free publication (i.e., not open), behind firewalls or pay walls, can create.
Given what I interpret the definition of ‘open science’ to be for the bioinformatics community, i.e., that data, tools, and concepts are stored in accessible repositories, my advice to the novice would be to resist the notion that ‘a bit of training and I’ll be on my way!’. Instead, I would suggest partnering with a bioinformatician (who is not a tool, but a scientific collaborator!) will yield far richer experiences, better science, and longer-lasting engagement with ‘open science’. In the US, when applying for federal grants, there is a new option for leadership (principal investigators or PIs) on grants: ‘multiple PI’ grants. As a statistician, my experience has been that a scientist may conceptualise the general structure of some project or program, but without the statistician the experimental designs, data collection and management, analysis and interpretation – ultimately, every indicator of success of that project or program – is impossible to articulate. Early in one’s career as a biostatistician, one may be included as an unnamed (i.e., totally exchangeable with any other statistician with the same software experience, basically) staff statistician; then one can become ‘key personnel’, and ultimately, one can become the ‘director of the biostatistics core’ – a co-investigator (i.e., very important to the success of the grant). However, what most clinical research requires is a biostatistician as co-PI, someone who contributes – or is seen and acknowledged to contribute – as much to the quality of the science as the person who devised the original question. This is not the current paradigm because it hasn’t had to be, but since about mid 2016, the US National Institutes of Health have begun requiring ‘reproducibility and rigour’ in funded work. Without statisticians in both the proposing and the reviewing sides of funding, this is going to be slow to implement.
I believe that biology/bioinformatics can avert that implementation slowness by adopting a commitment to reproducibility and rigour from the very start. Novice biologists can create and speed up adoption of the new paradigm by embracing an equal partner whose input will strengthen their own science, their chances of success in science and funding, and thereby, in science more broadly. If this partner is part of the nationally-supported open science infrastructure, so much the better. If there’s a third partner who is an expert statistician, then science benefits even more and long term success is more likely.
What are the top three actions/initiatives you would suggest biosciences domains prioritise to enable open science, and what type of support do they need?
From my perspective, I would suggest prioritising the following three initiatives:
- adopt a new paradigm of equal partner-co-PIs that encourages the basic scientists and the bioinformaticians alike to reconceptualise their roles in their collaborative science (as outlined above)
- ensure that only reproducible and rigorous science is funded and published
- reconsider how deeply and how often the notion that ‘a bit of training and I’ll be on my way!’ is reinforced – just as it takes a PhD’s worth of training to learn how to conceptualise important new experiments in any field, it takes some (if not a whole PhD’s worth) deep commitment and sustained attention to build up a skill-set that integrates a complex field like bioinformatics into the extant ‘biology’ knowledge.
Each of these entails policies, commitments, and embodiment. To support each of them, one institution or department (or open resource) could create a pilot program that is evaluable (which would require resources such as funding and access to specific expertise in relevant experimental design, data collection, data analysis, and interpretation), to explore what does work and doesn’t work, and what might need more elaboration before it can be tested. This kind of qualitative and mixed-methods research is not the first thought for quantitative sciences, but actionable evidence supports justifiable decision-making, and qualitative methods are much more useful than generally believed for policy and resource allocation decisions. Each of these initiatives is important because of a ‘bottom line’ mentality somewhere in the science pipeline, so featuring purposeful design and expert analysis in a program to challenge the dominant paradigm is essential – and must be supported even though it is unlikely to result in high impact publications. The fact that science, the public trust in science, and science education will be strengthened by this approach should be enough to ensure their support.
EMBL-ABR is a distributed national research infrastructure providing bioinformatics support to life science researchers in Australia. As such which is the role you see for EMBL-ABR when it comes to open science for Australian biosciences?
A key feature of the distributed nature of the infrastructure is that it can have important, positive influences around the entire country. If the country values science, and seeks to strengthen the public trust in science and science education, then EMBL-ABR is perfectly situated to support the realisation of improvements. If Australia took a lead role in modelling how to observably value and support excellence, transparency, rigour, and reproducibility, other countries would notice and follow Australia’s example. A recent publication in the Journal of the Knowledge Economy (Birch K, Levidow L & Papaioannou T. (2014) Self-Fulfilling Prophecies of the European Knowledge-Based Bio-Economy: The Discursive Shaping of Institutional and Policy Frameworks in the Bio-Pharmaceuticals Sector. J Knowl Econ (2014) 5: 1-18, DOI 10.1007/s13132-012-0117-4) offers a fascinating conversation about how institutional and policy objectives end up influencing the ultimate functionality and foci of how research and innovation are structured and prioritised. In cognitive psychology (my other research domain), this might be referred to as a combination of bottom-up (basic science informing policy) and top-down (policy informing science) influences. I see an important role for EMBL-ABR in actively soliciting and utilising the bottom-up influence, and not simply leveraging its ‘top-down’ influence. The top-down influence might be, ‘we have resources, please use them!’ while the bottom-up influence might be, ‘we value reproducible and rigorous science – irrespective of whether specific national resources are utilised.’
Are there bioscience discipline-specific limitations that require tailored solutions when it comes to open science rather than enough common denominators for shared resources/tools and solutions?
I do not believe the limitations are discipline-specific; as I’ve mentioned, a lot of what a consulting statistician does in an academic setting is serve as a shared resource – causing them to be perceived to be ‘a tool’ for getting science done. That is, bioinformatics can avoid generating or supporting the perception that bioinformaticians, like the shared bioinformatics resources and data, are just tools. This view limits the contributions bioinformaticians can make, just like it has been doing to statisticians in the life sciences in the US. Also, as I mentioned at the outset, focusing on just one’s discipline without consideration of similar challenges articulated and/or faced in other disciplines will simply delay progress.
That being said, consideration of solutions that must be tailored, rather than generic ones, can help to formulate more general solutions (later on) that are not simply the lowest common denominator. The compliance problem for training in the ‘responsible conduct of research’, as I have argued across my publications challenging that dominant paradigm, represents a lowest common denominator –and thereby, generic – solution to unethical scientific behaviour that is not a real solution. It allows everyone to say/show that they comply without actually promoting the desired responsibility: in the 30 or so years since the US National Institutes of Health established a requirement for training in the responsible conduct of research (essentially the same training for every participant in science, no matter what stage of career or how many times they’ve sat through it before, up until 2009 when the requirement was specifically described as requiring change over time), fraud and misconduct (globally, not just in the US) have been widely observed to have increased 7- to 10-fold.
The one thing you would like EMBL-ABR to do in the future when it comes to connections with existing international efforts is…
Take the lead for Australia in demonstrating a commitment to integrating bioinformatics into biological sciences, supporting engagement, partnerships, and deeper training and not only ‘point-of-need’ training. An example of the need for deeper training for both biologists and bioinformaticians comes from my experience as a statistician: I once reviewed a manuscript where a team of statisticians described a statistical model for Alzheimer’s disease progression. While the model was statistically sound, it was essentially devoid of any understanding of human brains, aging, and how human brains age in the presence of Alzheimer’s pathology. In my rejection letter to the editor, I articulated how damaging the model might be if published – promoting analysis of data that, while methodologically rigorous, was inconsistent with, and thereby potentially damaging to, a better understanding of the disease and clinical trials to treat it. Both sides of the science partnership need to understand the other –i.e., making it a true partnership.
There are international efforts in bioinformatics and in open science (and in open bioinformatics science), and describing, committing to, and sharing an Australian model that prioritises rigour, reproducibility, engagement and transparency could exert a positive influence on these international conversations.
Biosketch: Rochelle E. Tractenberg is currently a tenured associate professor in the Department of Neurology, with secondary appointments in Biostatistics, Bioinformatics & Biomathematics, and Rehabilitation Medicine, in the Georgetown University Medical Center. As founder and Director of CRΘM (the Collaborative for Research on Outcomes and -Metrics), Rochelle leads an organisation which is regarded as unique in the US for its emphasis on measurement – and not simply statistical modelling or psychometrics – in biomedical research. She is an accredited professional statistician and elected Fellow of the American Statistical Association, as well as a cognitive scientist, research methodologist, and PhD-level instructor interested in outcomes, measurement and -metrics, statistical literacy and research ethics for practicing scientists, and curriculum development and evaluation for higher (graduate/post-graduate) education. She is the current chair of the Committee on Professional Ethics of the American Statistical Association (2017-2019) and also served as its vice-Chair (2014-2016).