MACSE

MACSE Pipeline quickstart

The MACSE v2 toolkit provides the building blocks to construct powerful alignment pipelines. However, the number of available subprograms and options featured in this version could be problematic for occasional users. We hence included a Graphical User Interface (GUI) in MACSE V2 and provide some of our own MACSE V2 pipelines. These pipelines are mostly bash scripts encapsulated within singularity containers and sometimes combined in nextflow workflows. This page provides basic information regarding singularity, nextflow and how to run our pipelines using these tools.

Singularity

singularity: overview

A singularity container (Kurtzer, 2017) contains everything that is needed to execute a specific task. The person building the container has to handle dependencies and environment configuration so that the end-user do not need to bother. The file specifying the construction of the container is a simple text file called a recipe (we provide the recipe of our container as well as the containers). As our scripts/pipelines often relies on several other scripts and external tools (e.g. MAFFT) singularity container is very handy as the end user just need to install singularity and download the container without having to care for installing dependencies or setting environment variables.

singularity: installation

Singularity is likely available on your HPC center. If not, you could ask your HPC aministrator to install it, the singularity team provide a template message for this request.
If you need to install it on your own computer the singularity team provides a step by step guide to install singularity on linux and to install it on WINDOWS or MAC.

singularity: basic usage

If singularity is installed on your cluster/computer you can launch a container that execute a complex task relying on several libraries or programs by simply downloading the corresponding container and asking singularity to execute it.

Here we provide singularity container compatible with singularity v3 (or above). Launching a container named container.sif on a computer where singularity is as simple as downloading it, adding the executable permission to it using

  • chmod +x container.sif
  • ./container.sif
  • singularity run ./container.sif
  • singularity run-help ./container.sif

and then typing the command or, if you prefer To display the container help, use:

singularity: getting containers

There are three way to get a container.

  • You can simply download the container image .sif file as any other files (some are available in our download section).
  • Containers available on the singularity hub, can be download using the singularity pull command, e.g. singularity pull –arch amd64 library://vranwez/default/omm_macse:v10.02
  • You can rebuild the container (but this require the administrator rigths) using its recipe (small text file), e.g. sudo singularity build OMM_MACSE_v10.01.sif OMM_MACSE_v10.01_sing_3.3.def

Singularity: running the OMM MACSE pipeline

When you need to handle numerous large dataset, the computation time required by the alignSequences subprogram of MACSE may become a problem.
We faced this problem during the V10 update of the OrthoMaM database and developped the OMM_MACSE pipeline, that extensively rely on MACSE subprogramms, to be able to deal with such large datasets.
You can use the OMM_MACSE pipeline developped to obtain OrthoMaM alignments via the MBB server or using our singularity containers.
To run the OMM_MACSE pipeline on the file LOC_48720_NT.fasta using our singularity container (with default options) you need to specify the input file, output directory and the prefix of the output files:

  • ./OMM_MACSE_v10.01.sif –in_seq_file LOC_48720.fasta –out_dir RES_LOC_48720 –out_file_prefix LOC_48720

Nextflow

Nextflow (Di Tommaso, 2017) enables scalable and reproducible scientific workflowsusing software containers allowing the adaptation of pipelines written in the most commonscripting languages. Nextflow separates the workflow itself from the directive regarding the correct way to execute it in the environment. One key advantage of Nextflow is that, by changing slightly the “nextflow.config” file, the same workflow will be parallelized and launched to exploit the full resources of a high performance computing (HPC) cluster.

Nextflow: installation

Nextflow is really easy to install and it can be installed as a regular user (no need to be root) using one of the following command:

  • curl -s https://get.nextflow.io | bash
  • or: wget -qO- https://get.nextflow.io | bash

Nextflow: running the MACSE barcoding pipelines

To execute this pipeline, you need to get two of our singularity containers using:

  • singularity pull –arch amd64 library://vranwez/default/representative_seqs:v01
  • singularity pull –arch amd64 library://vranwez/default/omm_macse:v10.02

You also need to download the corresponding , and to adapt one of the nextflow configuration files that we provide. For instance, if you are working on a HPC using SGE as task manager, you can use this configuration file and modify it to specific the name of the job queue to be used. Once you get everything, aligning hundred of thousands of barcoding sequences (here some mammals COI sequences) is as simple as:

  • ./nextflow P_macse_barcode.nf –refSeq Homo_sapiens_NC_012920_COI_ref.fasta –seqToAlign Mammalia_BOLD_121180seq_COI.fasta –geneticCode 2 –outPrefix Mammalia_COI

Possible problems

When running the singularity image you got a message indicating:

  • ERROR : Unknown image format/type: OMM_MACSE_V10.01.sif ABORT : Retval = 255

You are probably trying to use a old singularity release to launch an image generated with a more recent release. Type singularity –version to know which release you are using, if your release is older than 3 (but above 2.5), please update it.

In rare case you may have an error message when running the container saying that:

  • ERROR : Base home directory does not exist within the container…
  • singularity run -H /homedir/ranwez:/home/ranwez OMM_MACSE_V10.01.sif

In this case please, can use the -H option as a tedious workaround; if your cluster home direcory is /homedir/your_loggin instead of the more usual /home/your_login expected by the singularity container you can specify this when lauching singularity. In my case, this leads to: and even so, I can only run the container from my home directory afterward. It is thus much better if your administrator can update the singularity configuration file on your cluster so that everything work smoothly. A better solution is to ask your HPC cluster administrator to fix singularity binding configuration or to help you regenerating the singularity image. To do so download the recipe file we provide and add instruction to create the missing directory in the %POST section. For instance on one cluster I got the following error message:

    #WARNING: Non existent bind point (directory) in container: /work
    #WARNING: Non existent bind point (directory) in container: /homedir
    #WARNING: Non existent bind point (directory) in container: /usr1/compte_mess
    #WARNING: Non existent bind point (directory) in container: /projects
    #ERROR  : Base home directory does not exist within the container: /homedir
    #ABORT  : Retval = 255
	

To fix the problem I add the following commands in the %post section of the OMM_MACSE 2.5 recipe file

    mkdir /work
    mkdir /homedir
    mkdir /projects
    mkdir -p /usr1/compte_mess
	

and regenerate the singularity container (you need the administrative priviledges to do so)

  • sudo singularity build OMM_MACSE_v10.01.sif OMM_MACSE_v10.01_sing_2.5.def

Further readings