Data and Research
The research lifecycle spans multiple stages:
• Research study concept and planning (which includes applying for funding, organising ethics approvals and other collaboration agreements etc);
• Commencing the research;
• Data collection or aggregation (from pre-existing data sources) or data capture (from devices, surveys or instruments) and its subsequent processing;
• Data analysis, storage and management;
• Dissemination/publication of the research results and provision of access to relevant materials and data to enable the research findings to be verified.
Good data management is required across most of these stages and includes finding, collecting/capturing, integrating, processing, analysing, visualising, publishing, storing, sharing and reusing data and metadata:
EMBL-ABR strongly supports data sharing and data reuse and believes that data generated as part of publicly-funded research should be as publicly available as possible. Ideally, to obtain the most value from such data, it should be Findable, Accessible, Interoperable and Reusable – i.e. adhere to the FAIR principles.
See our Opinion Article published in F1000Research entitled “Best practice data life cycle approaches for the life sciences” and our poster from the 2016 ABACBS Conference entitled “Navigating the Research Data Lifecycle”.
Looking for biomolecular data of a particular type but not sure where to look?
• The FAIRsharing database catalogue lists over 1,000 biomolecular and other databases – partly compiled with the support of Oxford University Press (NAR Database Issue and DATABASE Journal). EMBL-ABR is actively ensuring Australian databases are listed in FAIRsharing to increase their discoverability. If you know of other Australian resources that should be included, contact us.
• re3data.org lists more than 2,000 Data Repositories and is Science Europe’s Framework for Discipline-specific Research Data Management. You can search it or use their cool interactive subject data browser.
• ELIXIR also maintain a list of Core Data Resources (European data resources of fundamental importance to the wider life-science community and the long-term preservation of biological data) as well as Deposition Databases (resources that ELIXIR recommend for the deposition of experimental data).
• NIH BD2K DataMed is a prototype biomedical data search engine. Its goal is to enable discovery of data sets across data repositories or data aggregators.
• Omics Discovery Index (OmicsDI) provides dataset discovery across a heterogeneous, distributed group of genomics, proteomics and metabolomics data resources, including both open and controlled access data resources.
• Research Data Australia is maintained by ANDS (the Australian National Data Service) and is a searchable registry of research data collections that includes many biomolecular datasets.
EMBL-ABR supports best-practice experimental design and analysis in bioinformatics, to encourage the cost-effective collection of high-quality, informative and reusable data. Capturing data directly from an instrument with appropriate metadata can help. For information about open source data capture solutions that have been deployed at various institutions around Australia, see the summary of ANDS’ Data Capture program which funded 69 projects at over 30 institutions to simplify the process of routinely capturing data and metadata as close as possible to the point of creation, and depositing these data and metadata into well-managed institutional data stores.
Data Management Planning
Prior to starting a research project it makes sense to plan for how you will manage the data you will use. For example: What types of data will you produce or collect? How much data will you produce or collect? Where will you store it? Will that storage be appropriately secure? How will you analyse it? On what types of computers? How will you connect the stored data to these computers? Where will you store the analysed data? Do you want to share the data with collaborators? Where are/will your collaborators based? Will you need to publish the data in a discipline-specific repository? How long will the primary data need to be retained for?
Chances are that your institution will have research data management planning resources (on-line tools or simple paper based questionnaires) for you to use. If not, there are also on-line options such as DMPonline (from the UK Digital Curation Centre (DCC), or DMPTool (from the University of California Curation Center (UC3).
Data Management Policies
The Australian Code for the Responsible Conduct of Research states that researchers must manage research data and primary materials in accordance with the policy of their institution, and an institution: must have a policy on the retention of materials and research data; must have a policy on the ownership of research materials and data during and following the research project; and must have a policy on the ownership of, and access to, databases and archives that is consistent with confidentiality requirements, legislation, privacy rules and other guidelines, so check your institution’s policy library for your local data management policy – this will outline what your responsibilities are with respect to management of data and other research materials.
Data Storage and Management Systems
It is likely that your institution will also be able to recommend a mature, networked and well managed (e.g. maintained, backed up etc) data storage system, that ideally allows metadata to be attached to your data items. Check with your institution.
EMBL-ABR nodes have been actively involved in a number of projects that have developed sophisticated data (+metadata) management components – e.g. the RDS-funded OMICs platform (whose data management capabilities are based on Mediaflux). If you are interested in more information, contact us.
Some EMBL-ABR nodes are also involved in discussions with the National Science Foundation (US)-funded Cyverse project to discuss the features of the data Discovery Environment, which is another sophisticated data (+metadata) management system (based on iRODS). Again, if you are interested in more information, contact us.
General Data Sharing guidelines
Looking for general guidance on data standards, journal data policies, or which public database to choose for your dataset for data publishing purposes? FAIRsharing.org is a good place to start.
Sharing data in a collaborative project
The Australian Code for the Responsible Conduct of Research recommends that organisations must establish agreements for each collaboration, prior to collaborative work. The agreement should be in writing and must include the management of research data. The Code also states that collaborating parties should each identify a person to be involved in the management of research data and other items to be retained at the end of the project.
Not sure where is the best repository to deposit data? ELIXIR maintain a list of Deposition Databases (i.e. stable and well managed resources that ELIXIR recommend for the deposition of experimental data of various types). The list will be reviewed regularly and resources added over time.
Becuase biomolecular data can often be large, complex and have a wealth of important contextual, yet complex metadata associated with it sometime depositing it in an appropriate repository can be time consuming and laborious. In order to help Australian researchers publish/submit data (and the required associated metadata) to international databases (primarily the European Nucleotide Archive), the EMBL-ABR QCIF Node is currently providing a “Data Submission to EBI Service”.
If you have genomics data to submit to ENA and would like some help or advice on the best way to do this yourself, please contact the Data Submission service.
The team is also working towards supporting other data types so feel free to also contact them to ask for advice on other data types as well.
For more information about the EMBL-ABR Key Area of Data, please contact Jeff Christiansen, Data Coordinator.