The Research Data Life Cycle
Where does your research data go once you’ve published your paper? Can you do better? Good data management spans all stages of the data life cycle: finding, collecting, integrating, processing, visualising, analysing, publishing, sharing and reusing data and metadata. EMBL-ABR strongly supports data sharing and data reuse and believes that data generated as part of publicly-funded research should be as publicly available as possible. Ideally, data should be Findable, Accessible, Interoperable and Reusable – following the FAIR principles.
See here for the full EMBL-ABR poster summarising the Research Data Life Cycle for biological/bioinformatics data. This poster was presented at the AB3ACBS 2016 Conference in Brisbane.
Do you need training in using public databases like the European Nucleotide Archive, Ensembl, ChEBI or DGVa? Let us know what type of training are you looking for by emailing Christina Hall, Training Coordinator.
Explore indexes that help users find and access shared data here.
Are you a medical genomics researcher? The EMBL-ABR:QCIF Node is working with the Genomic Alliance for Global Health (GA4GH) BEACON project. This scheme aims to set up a simple, open web service that aids data sharing in the medical genomics realm by answering yes/no questions like: “Does this dataset include any individuals with variant A at position 3900542?”. This allows researchers to identify useful datasets, while avoiding personal information.
EMBL-ABR supports best-practice experimental design and analysis in bioinformatics, to encourage the cost-effective collection of high-quality, informative and reusable data. If you are interested in contributing to international standards and best-practice protocol development for your area of expertise, please contact us.
EMBL-ABR aims to offer best practice guidance, networking and training to support for genomics and computational biology including data analysis.
Do you need guidance on the benefits of life science or medical data sharing and reuse, data standards, journal data policies, or which public database to choose for your dataset? Biosharing.org is a great place to start. EMBL-ABR is actively curating Australian databases to increase their discoverability via the EMBL-ABR Biosharing Collection. This is a collection of Australian databases on topics from spider toxins to chickpea genomics, as well as standards and policies in Bioinformatics relevant to Australia life scientists and medical researchers. If you know of other databases that we should include, please get in touch with Vicky. See here for more information on Biosharing.org.
A recent paper by McKiernen et al. in eLife lays out the benefits to researchers of data sharing and Open Science more broadly, including open-access publishing:
- more citations
- more media coverage
- transparent peer review
- preprints and open archiving
- retention of author rights and control of reuse
- compliance with funder requirements
- documentation and reproducibility benefits for yourself and others
- ease of finding new projects and collaborators.
Bioinformatics datasets can often be large, complex and have a wealth of important metadata. The EMBL-ABR: QCIF node has been developing and working on best practice in sharing data and is currently championing EMBL-ABR activities for data chaperoning.
The new EBI databases BioStudies and BioSamples are designed to organise, link and make discoverable multiple datasets. Submitting a sample record to BioSamples means you only have to enter the sample information once – and you can then link to your BioSample record when submitting the experimental datasets to ENA, ArrayExpress, MetaboLights or whichever other databases are relevant. BioStudies is much broader. It aims to be a collation point for all data related to a study: including data stored in non-EBI repositories, data that doesn’t have a suitable repository, including supplementary information associated with publications. More information.
For specific guidance on how to describe experimental metadata, this October 2016 feature article gives a full introduction to the ISA framework and ISAtools.
EMBL-ABR can assist researchers with handling and collating metadata and submitting array and NGS data to public repositories, including batch submission of large datasets. See more information here or email firstname.lastname@example.org to request assistance.
EMBL-ABR Data Activities
EMBL-ABR is involved in many aspects of the large, collaborative multi-omics project on bacterial pathogens that can cause sepsis. See here for more information.
For more information about the EMBL-ABR Key Area of Data, please contact Pip Griffin, Open Data Coordinator.