Aligning or merging two alignments

A key step in MACSE optimization procedure is aligning two multiple sequence alignments (also called profiles), each of which containing a subset of the input sequences. This procedure is the same as 2-cut strategy used in muscle and in many other multiple sequence alignment software.

If you have already two nucleotide alignments/profiles (p1.fasta and p2.fasta) containng homologous sequences you can align them with the subprogram alignTwoProfiles of MACSE to produce a unique alignment p1_p2.fasta containing all the sequences. Note that alignTwoProfiles cannot modify the input profiles as it will only insert gaps (or frameshifts) into them. In other words, any pair of nucleotides that was in the same site/column in p1.fasta (or in p2.fasta) will still be in the same site/column in p1_p2.fasta.

1. Basic usage

This program can be seen as one step of the 2-cut optimisation done by alignSequences or refineAlignment, it thus could be helpful to look at alignSequences help pages before using it.

Folder: samples/alignTwoProfiles/

The same options as those used for alignSequences allow you to (1) control for the balance between optimization and speed (-optim, -max_refine_iter, local_realign_init, local_realign_dec), (2) specify a subset of less reliable sequences with different costs for their frameshifts and stop codons (-seq_lr), (3) choose the name of output files (out_NT, out_AA) and (4) select your own elementary alignment costs:

  • java -jar macse.jar -prog alignTwoProfiles -p1 p1.fasta -p2 p2.fasta
  • java -jar macse.jar -prog alignTwoProfiles -p1 p1.fasta -p2 p2.fasta -out_NT output_NT.fasta -out_AA output_AA.fasta
  • java -jar macse.jar -prog alignTwoProfiles -p1 p1.fasta -p2 p2.fasta -seq sequences.fasta -seq_lr sequences_lr.fasta

2. Possible strategy for aligning a very large number of sequences

This program could be useful to build an alignment with a very large number of sequences. You can start by clustering them (either based on homology or taxonomy), then aligning independently each cluster of sequences and then merging those alignments using alignTwoProfiles as many times as required. A basic alignment refinement (e.g. -optim -1) of the global alignment can then be used to try to improve the final result.

Alternatively, you can produce a codon consensus sequence for each aligned cluster (exportAlignment), align those consensus sequences, and use the resulting alignment to produce the global alignment by reporting gaps of each consensus sequence into the corresponding cluster alignment (reportGapNT2AA).

3. Related documentation

You can find other options related to this program from the following links: