Data

Data and Research 

The research lifecycle spans multiple stages:

• Research study concept and planning (which includes applying for funding, organising ethics approvals and other collaboration agreements etc);
• Commencing the research;
• Data collection or aggregation (from pre-existing data sources) or data capture (from devices, surveys or instruments) and its subsequent processing;
• Data analysis, storage and management;
• Dissemination/publication of the research results and provision of access to relevant materials and data to enable the research findings to be verified.

Good data management is required across most of these stages and includes finding, collecting/capturing, integrating, processing, visualising, analysing, publishing, sharing and reusing data and metadata.

EMBL-ABR strongly supports data sharing and data reuse and believes that data generated as part of publicly-funded research should be as publicly available as possible. Ideally, to obtain the most value from such data, it should be Findable, Accessible, Interoperable and Reusable – i.e. adhere to the FAIR principles.

See our Opinion Article published in F1000 entitled “Best practice data life cycle approaches for the life sciences and our poster from the 2016 ABACBS Conference entitled “Navigating the Research Data Lifecycle”.


Finding Data

Looking for biomolecular data of a particular type but not sure where to look?

• The FAIRsharing database catalogue lists over 1,000 biomolecular and other databases – partly compiled with the support of Oxford University Press (NAR Database Issue and DATABASE Journal). EMBL-ABR is actively ensuring Australian databases are listed in FAIRsharing to increase their discoverability. If you know of other Australian resources that should be included, contact us.
NIH BD2K DataMed is a prototype biomedical data search engine. Its goal is to enable discovery of data sets across data repositories or data aggregators.
Omics Discovery Index (OmicsDI) provides dataset discovery across a heterogeneous, distributed group of genomics, proteomics and metabolomics data resources, including both open and controlled access data resources.
Research Data Australia is maintained by ANDS (the Australian National Data Service) and is a searchable registry of research data collections that includes many biomolecular datasets.


Capturing data

EMBL-ABR supports best-practice experimental design and analysis in bioinformatics, to encourage the cost-effective collection of high-quality, informative and reusable data.  Capturing data directly from an instrument with appropriate metadata can help. For information about open source data capture solutions that have been deployed at various institutions around Australia, see the summary of ANDS’ Data Capture program which funded 69 projects at over 30 institutions to simplify the process of routinely capturing data and metadata as close as possible to the point of creation, and depositing these data and metadata into well-managed institutional data stores.


Managing Data

Data Management Planning

Prior to starting a research project it makes sense to plan for how you will manage the data you will use. For example: What types of data will you produce or collect? How much data will you produce or collect? Where will you store it? Will that storage be appropriately secure? How will you analyse it? On what types of computers? How will you connect the stored data to these computers? Where will you store the analysed data? Do you want to share the data with collaborators? Where are/will your collaborators based? Will you need to publish the data in a discipline-specific repository? How long will the primary data need to be retained for?

Chances are that your institution will have research data management planning resources (on-line tools or simple paper based questionnaires) for you to use. If not, there are also on-line options such as DMPonline (from the UK Digital Curation Centre (DCC), or DMPTool (from the University of California Curation Center (UC3).

Data Management Policies

The Australian Code for the Responsible Conduct of Research states that researchers must manage research data and primary materials in accordance with the policy of their institution, and an institution: must have a policy on the retention of materials and research data; must have a policy on the ownership of research materials and data during and following the research project; and must have a policy on the ownership of, and access to, databases and archives that is consistent with confidentiality requirements, legislation, privacy rules and other guidelines, so check your institution’s policy library for your local data management policy – this will outline what your responsibilities are with respect to management of data and other research materials.

Data Storage and Management Systems

It is likely that your institution will also be able to recommend a mature, networked and well managed (e.g. maintained, backed up etc) data storage system, that ideally allows metadata to be attached to your data items. Check with your institution.

EMBL-ABR nodes have been actively involved in a number of projects that have developed sophisticated data (+metadata) management components  – e.g. the RDS-funded OMICs platform (whose data management capabilities are based on Mediaflux). If you are interested in more information, contact us.

Some EMBL-ABR nodes are also involved in discussions with the National Science Foundation (US)-funded Cyverse project to discuss the features of the data Discovery Environment, which is another sophisticated data (+metadata) management system (based on iRODS). Again, if you are interested in more information, contact us.


 Sharing Data

General Data Sharing guidelines

Looking for general guidance on data standards, journal data policies, or which public database to choose for your dataset for data publishing purposes? FAIRsharing.org is a good place to start.

Sharing data in a collaborative project

The Australian Code for the Responsible Conduct of Research recommends that organisations must establish agreements for each collaboration, prior to collaborative work. The agreement should be in writing and must include the management of research data. The Code also states that collaborating parties should each identify a person to be involved in the management of research data and other items to be retained at the end of the project.


Publishing Data

Biomolecular data can often be large, complex and have a wealth of important contextual, yet complex metadata associated with it. In order to help Australian researchers submit data (and the required associated metadata) to international databases (primarily the European Nucleotide Archive), the EMBL-ABR: QCIF node has been developing and working on best practice in sharing data and is currently championing EMBL-ABR activities for “data chaperoning”.

If you have genomics data to submit to ENA, and you’d like some help, please contact the EMBL-ABR Data Chaperoning service.

The Data Chaperoning service is also working towards supporting other data types so feel free to contact us to ask for advice on other data types as well.


Contact  

For more information about the EMBL-ABR Key Area of Data, please contact Jeff Christiansen, Data Coordinator.