EMBL-ABR network: an interview with Jyoti Khadake

Dr Jyoti Khadake is the Senior Data Scientist at the NIHR BioResource and she drives the design, implementation, curation and enrichment of data resources. NIHR BioResource is an England-wide federated organisation with 8 different local BioResources and multiple CRFs that help in volunteer-based medical research with clinical, ‘omics, and phenotypic data undJyoti2erpinning the recall. She also helps the National NIHR BioResource  and the collaborating NIHR BioResources to implement effective vocabularies, data frameworks and policies. The cohorts have varied backgrounds eg. Hospital, Blood services, Universities.

December 2016

___________________________________________________
What is bioinformatics for you and why does it matter?

Analysis of the data on biological systems constitutes bioinformatics. Quite often this involves integrating data from multiple experiments or sources. The advance in computational approaches to handling data and growth in data science has lead to massive advances in bioinformatics.

What are the challenges you see for life scientists / medical researchers in the data driven science era?

A major hurdle to the medical research is the limitations posed by the person-identifiable and sensitive data and implications for using this data to draw conclusions. There is still much work to do in this area.

The clinical sciences also offer opportunities to conduct analysis of multi-variant data, so creating the right environment for collaboration across the different disciplines is necessary to make the most of this data.

Would you say this is different for actual bioinformaticians? Do they face different challenges?

For most bioinformaticians not involved in human research, the social and medical aspects may be significant but are not imminent.

What is open data, and what does it mean to you?

Open data is data that can be accessed shared or even republished without restrictions but with accreditation. It could be sourced from  researchers, companies or governments. Currently this is restricted to release of non-personal aggregated data – social or medical – and it may help to develop crucial economic, social and medical improvements for society.

What is currently missing in the field of bioinformatics AND life sciences?

We need an amalgamation of the policies and data handling principles across the board so the data can be exchanged and used across continents. We also need good statistical analysis of extended data so this can produce more meaningful results.

It is early days yet, but what would you like to see EMBL-ABR become, achieve?

EMBL-ABR could be the one stop shop for Australian research where researchers could have access to the bioinformatics of different kinds of data and through which researchers could access expertise in data analysis.

_________________________

THE DATA LIFE CYCLE

What is the data life cycle?

Data and meta-data is available on all queries or experiments. It needs to be collected in a regulated manner in accordance with the core principles and the generation of the data is governed by data standards. The storage and exchange is also controlled by the data standards as well as various licensing laws. The minimal data formats, data regulations and controlled vocabularies help this process also. The standardised data is then subject to analysis which contextually converts the data and meta-data to information. This information now can inform further research.

Why does it matter now?

The world is opening up with the development of open data standards and policies.  The availability of software and standards for exchange makes it possible to push this further now. OECD and funding and publishing policies also make it necessary to make the published data available.

Who should care about it?

Information compliance officers, data initiatives, researchers and data generators should all care about this.

How is it relevant to bioinformatics in Australia?

One main reason I can see is that Australia has very distinctive flora and fauna. Agriculture and animal husbandry are prime industries in Australia and any advances in data generated and information accumulated in research projects in these areas can potentially help Australia advance in these industries.

_________________________________________________________________________________________

Biosketch: Prior to working at NIHR, Jyoti worked as Data Manager and Bioinformatician at the South London and Maudsley Hospital for mental health patients, where she set up a data collection platform for recruitment of volunteers as well as the pipeline for genotype data, and at EBI she integrated and analysed large scale data from varied studies, working with controlled vocabularies and data types such as ChEBI, PRIDE, GOA, UniProt and Reactome. She moved into bioinformatics from medical research having worked as a scientist and postdoc investigating genetic disorders like the inherited Prader Willi Syndrome and Angelman syndrome and Kippel Fiel and Mobius syndromes, following a PhD on higher order chromatin structure function modulation by linker histones and relation to gene expression.