Researchers and medical professionals often find themselves in a situation where they wish to share or explore data located in different physical locations in a safe and secure way.
Responding to this situation, one aim of EUCANCan is to implement a cultural, technological and legal integrated framework across Europe and Canada that enables and facilitates the efficient sharing of cancer genomic and clinical data. To this end, the consortium has developed the EUCANCan Portal, a federated network of interoperable nodes, which allows users to view and search for data files in different locations. To learn more about the portal, the EUCANCan communications team recently caught up with Brandon Chan of the Ontario Institute for Cancer Research (OICR) in Toronto, Canada, who explained the key steps in the development process.
The work with developing the portal has been spearheaded by EUCANCan’s Work Package 4 (A federation of data portals for interoperability across EUCANCan nodes and responsible sharing of patient genomic data), led by the OICR.
The EUCANCan portal was launched in late 2022
The first version of the portal was launched earlier this year, and data from the three nodes connected to the portal in the first phase has been loaded. These nodes are OICR in Toronto, Canada; Charité in Berlin, Germany; and the Barcelona Supercomputing Center (BSC) in Spain.
A prerequisite for ensuring interoperability between nodes is that the language and format they use to organize their data are harmonized. Laying the foundation of the portal, the team at Charité developed an EUCANCan dictionary, which outlines how the portal data should be organized. This dictionary builds on an already existing dictionary from the ICGC ARGO platform.
In the next step, a software package for easy loading of data into the portal was bundled to make the installation process easier and quicker. Before data can be loaded, the submitter must ensure that it is organized according to the data dictionary, which is a key step in ensuring that data submitted by different groups are compatible. Submitters do get some help from the system, which validates the data and notifies the of any potential issues. While previously, only the team at OICR knew how to install the software required for this process, the bundling of the software now makes it easy for the teams at Charité and BSC to install it with remote support from the OICR team.
Once all three pilot nodes were connected to the portal, 20 sample whole-genome datasets from each node were loaded to test the process. The sample data is part of a subset of 60 donors included in the ICGC 25K project, a previous project by the OICR team, which had been curated, i.e. translated as per the EUCANCan dictionary. All nodes are now exploring the possibility of loading more clinical data to the portal, and the group is also discussing the possibility of enhancing the portal with a federated search function.
When asked what makes the EUCANCan portal unique, Brandon Chan responded:
“The network of nodes is new, we have never done that before and will be using lessons learned from this project to create a federated model in other projects,” adding that:
“In this type of project, the people are the most important. It has been a great experience to be part of this project and work with people from different countries to arrive at a shared solution and then put it into production.”