written by Ivo Gut (CNAG-CRG), Miranda Stobbe (CNAG-CRG), Jordi Rambla (CRG)
Once upon a time, a group of hungry travellers arrived at a village carrying just an empty cooking pot. They asked for food from the villagers, but none of them wanted to share their data (aka food). The travellers filled the cooking pot with water and a stone. A curious villager asked what they were making. “Stone soup”, answered a traveller, “it is going to be delicious, but it could use some garnish to improve the flavour…” The villager left and came back with some carrots in the hope he could also taste this ‘stone soup’. The carrots were added to the soup, and more and more villagers became curious and started to bring more garnishes. Finally, the stone was removed and the resulting soup was shared among travellers and villagers alike. Without being aware of it, the villages were tricked into sharing their food and they all profited from doing so.
The EUCANCan project organized an online workshop on 24 May 2023 to discuss what would be needed to bridge the gap between clinical practice and bioinformatics research to allow for joining forces and make this ‘stone soup’. In difficult-to-solve patient cases, especially in rare diseases, but also in cancer, clinicians across the country or even the world are already being consulted and patient information is being shared. What can we learn from how is currently shared data in clinical practice? In bioinformatics, tools are being developed and new insights into cancer and other diseases are gained based on patient data. However, how do we ensure that the results get used in clinical practice and do not linger for eternity in the theoretical world? How do we compete with well-established methods from, for example, the field of pathology? When patient data is shared with the world of research, how do we ensure both sides benefit from it?
Eight speakers from the EUCANCan consortium and related data sharing projects presented during the workshop,with a particular focus on clinical information, sharing of data for research (with the objective to improve patient outcomes), and federation of content between platforms, jurisdictions, and across borders. Presentations covered an overview of the achievements of the EUCANCan project, efforts in the harmonization and standardization of ontologies used in clinical care, approaches to bring EHR closer to utility for research, tools for the conversion between ontologies used in clinics and research, methodology for making data FAIR and linkable, clinical decision support systems benefitting from the wealth of research knowledge and the integration of such technologies for cancer care.
The workshop was rounded off by a lively panel discussion that intended to identify the remaining gaps that require the attention of the community. In particular, in order to advance the field, access to data is required. Too many studies are being done with synthetic data, which does not reflect a true world scenario. Access to real-life data is quintessential. There is a need for stronger standards, quality criteria and the definition of minimal datasets. Also, handling of longitudinal records and patient histories is important. It was highlighted that the ownership of clinical data is unclear, does the data belong to the hospital where a patient is treated, the healthcare system, the treating clinician or the patient? A key question that arose was, what is the justification for not sharing data. A further aspect that was weighed in on is that journals request depositing (genomic) data in repositories such as the EGA. Compliance with this is not complete and scientists resort to depositing less useful versions of the data, such as aggregated data, the value of which is significantly less than the complete data. In many instances the data is deposited in the EGA; however, access is later denied to researchers applying for access. Finally, the pheno-clinical data is often not included in deposited data. Data without pheno-clinical information has only marginal value. It is clear that the attitude towards “real and serious” data sharing needs to change.
The workshop identified five areas where action is needed:
- Access to real-life data
- Standards and quality criteria
- Definition of minimal sets of meta-information
- Longitudinal patient record information
- Deposition of pheno-clinical data in public repositories alongside genomic data
Measures that could help remediate this could be mandating at the level of funding projects, by specifying very precisely from the onset of a project what needs to be deposited in public repositories. Another measure can be that scientific journals are encouraged to require more than just the sequence information to be deposited in public repositories. Also, statistics on successful access to studies in the EGA could be measured (how often are datasets downloaded and probably then also used) to assess the impact of a study.
The workshop was a great success with 100 people inscribed to participate and 75 people who connected to the Zoom call.