Geniac: documentation for streamlined prototyping and management of bioinformatics pipelines

After two years of intense work, Philippe Hupé of Institut Curie in Paris, France, and his team published the geniac documentation this summer. Geniac’s public webpage provides a comprehensive set of guidance documents and tools that help bioinformaticians and statisticians to harmonize how they prototype, develop and manage pipelines to analyze high-throughput data. 

Philippe Hupé leads EUCANCan’s Work Package 2, Genome analysis pipelines to support the therapeutic decision, which focuses on benchmarking different pipelines to help EUCANCan’s goal to harmonize data sharing between institutions and research Centers in Europe and Canada. In the context of WP2, Philippe Hupé and his team have developed geniac.

Geniac consist of three main components:

  1. A public webpage that provides guidelines for prototyping and implementing pipelines using the workflow manager Nextflow, which allows the adaptation of pipelines written in the most common scripting languages.
  2. A software that runs a check on your code and notifies you if it is not built according to the geniac guidelines. The software helps developers to easily detect potential errors in their code.
  3. A toolbox which automates the construction of ‘plug-and-play’ containers in order to have portable and reproducible pipelines.

Today, many bioinformaticians, statisticians and others involved in research projects that include large quantities of genomics data use diverse pipelines. Geniac’s suggestion for a shared standard for structuring the code in bioinformatics pipelines is an important step towards improved harmonization between different pipelines and developers. 

Beyond harmonization, a key aim of geniac is to enable bioinformatics and statisticians to automate much of their work related to developing, managing, deploying and operating pipelines.

General principle of the Geniac guidelines and toolbox: 1/ a new tool is added in the pipeline according to the guidelines in the Geniac documentation depending on where the tool is available, 2/ the toolbox parses the structure of the source repository and the content of the conf/geniac.config file in order to automatically generate all the configuration files which define the Nextflow profiles, 3/ during the parsing, it also generates the container recipes (here the Singularity Definition Files) which are used to 4/ build the containers (here the Singularity images).

Best practices for prototyping and managing a bioinformatics pipeline

Geniac is developed for bioinformaticians, statisticians and others who are involved in the implementation of pipelines that serve to analyse high-throughput data.

One way through which the geniac webpage provides support for these user groups is by sharing best practices for prototyping and managing pipelines through the entire development lifecycle. The best practices are motivated by the portability and the reproducibility of the pipelines. 

Next steps

Philippe Hupé and his team are currently applying the geniac best practices on both research and diagnostics pipelines used for analysing whole-genome and entire genomes at the Institut Curie and as part of the EUCANCan framework.

The learnings from developing geniac have been published for peer review in an article titled “Geniac: Automatic Configuration GENerator and Installer for nextflow pipelines” on the European Commission’s Open Research Europe platform. Philippe Hupé and his co-authors are actively working on a second release of geniac, in particular to take into account the reviewer’s feedback.