Is improving your cloud data management skills or learning more about how you can effectively clean data in a data processing pipeline your New Year’s resolution for 2021? In that case, you may consider registering for EUCANCan and ESPACE’s one-day training workshop ”How to use cloud computing efficiently in biomedical research” on January 12, 2021, between noon and 6pm.
As all aspects of the biomedical research lifecycle are becoming ever more digitalized and most health institutions collect massive amounts of data daily, sustainable data handling and processing plays an increasingly important role in health and science.
EUCANCan and ESPACE’s workshop on January 12, 2021, is tailored for bioinformaticians, computer scientists, and all researchers in life science. The event aims to provide training in how to set up and manage efficient portable workflows in the cloud and teach how analytics can be scaled to handle growing input data successfully. The six-hour event will consist of both theoretical presentations and hands-on activities lead by experienced researchers/cloud developers.
The main objective of the one-day workshop is to provide all participants with the tools they need to make better use of modern cloud infrastructures in order to exploit data more effectively.
The training workshop will cover:
- How to set up a compute-cloud environment that facilitates the analysis of large cancer genome data projects, including the ICGC, PCAWG, de.NBI Cloud, and Cancer Genome Collaboratory
- How to use prominent workflow managers such as Snakemake, Nextflow, and CWL in cloud environments.
- Tips for how to sharpen your data visualization and data orchestration skills
- How deployment ofmachine learning workflows can be made portable and simple
- A review of pipeline frameworks for computing in the cloud
- A round tour of the GA4GH-compliant training platform Dockstore, which includes all required services (including Workflow and Task Execution and Tool Registry Service services)
- The basics of GA4GH WES, TES, TRS, and DRS
- An overview of available datasets from the International Cancer Genome Consortium (ICGC) and the PanCancer Analysis of Whole Genomes (PCAWG).
The training infrastructure is funded by de.NBI, who will provide virtual machines and training environment to all participants. The Cancer Genome Collaboratory will provide configurable virtual machines (VM) that can be computed on its data, and Dockstore will contribute with container packages of common genome analysis tools and workflows.
Sign up here to secure your spot in this free of charge event.
12:00 Welcome and Introduction to EUCANCan and ESPACE (HCA)
12:15 Best practice of software engineering for Nextflow pipelines
Philippe Hupé (Institut Curie, Paris)
12:45 Kubernetes and the cloud roll-out concept
Marius Dieckmann (University Gießen)
13:15 Using GA4GH standards for workflow execution in the cloud with Snakemake.
Sven Twardziok, Ben Wulf (Charité, Berlin), Philip Kensche, (DKFZ, Heidelberg)
- Basic workflow development guidelines
- Introduction to GA4GH cloud standards
- Cloud data management
- Running Snakemake workflows in the de.NBI cloud
15:15 Coffee break
16:00 Running CWL workflows in the cloud with CWLab.
Pavlo Lutsik and Kersten Breuer (DKFZ, Heidelberg)
17:00 GA4GH compliant pipeline development with Nextflow.
Christina Yung (OICR, Toronto)
17:30 Big cancer Dataset resources: the ICGC Portal and the Cancer Genome Collaboratory Cloud.
Christina Yung (OICR, Toronto)
- Large-scale research projects that generate datasets used in the Cloud
- Searching for ICGC data stored on Collaboratory
- How to access data in the Cloud and accessing non-protected data
18:00 Closing remarks
Date and time: January 12, 2021, 12am to 6pm CET.
Registration: Sign up here
Cost: Free of charge
EUCANCan supports and enhances modern oncology by creating a culturally, technologically, and legally integrated framework across Europe and Canada that enables and facilitates the efficient analysis, management, and sharing of cancer genome data.
The project is funded by the European Commission through the Horizon 2020 programme on the European side and the Canadian Institute of Health Research (CIHR) on the Canadian side. EUCANCan, coordinated by Prof David Torrents from Barcelona Supercomputing Center, started on January 1, 2019, and will run for four years.
Expression and Spatial analysis Pancreas Atlas Consortium Europe (ESPACE) is the result of three earlier Human Cell Atlas (HCA) pilot studies of the pancreas. The project develops methods and standards for sample acquisition, single cell profiling, spatial proteomics and computational pipelines and aims to build a first version of the HCA of the pancreas.
The two-year project started on January 1, 2020 and is funded by the European Commission through the Horizon 2020 programme. The project is coordinated by Prof Roland Eils from Charité – Universitätsmedizin in Berlin.