Translate nucleotide sequences into amino acid sequences

The MACSE subprogram translateNT2AA translates nucleotide sequences into amino acid ones using the specified genetic codes.

1. Basic usage

The only mandatory option of this program is the seq option that indicates the name of the FASTA file containing the sequences to be translated:

  • java -jar macse.jar -prog translateNT2AA -seq sequences.fasta

If your input file contains aligned nucleotide sequences you can choose to ignore those gaps to obtain unaligned translated amino acids sequences (while preserving frameshifts, hence the correct reading frames):

  • java -jar macse.jar -prog translateNT2AA -seq sequences.fasta -ignore_gaps_ON

Warning By default MACSE automatically removes the final stop of the sequences. This could be problematic if you plan to align amino acid sequences and to use the resulting alignment to derive an alignment of your nucleotide sequences using reportGapsAA2NT as the length of your nucleotide and amino acid sequences will no longer match. To avoid this problem you can ask MACSE to keep the final stop codon:

  • java -jar macse.jar -prog translateNT2AA -seq sequences.fasta -keep_final_stop_ON

2. Ignoring some sequences or nucleotides

You can ask translateNT2AA to ignore sequences containing too many internal stop codons, here sequences with more than one internal stop codon will not be translated:

  • java -jar macse.jar -prog translateNT2AA -seq sequences.fasta -guessOneReadingFrame -maxSTOP_inSeq 0

The default value is -1, all negative values will lead translateNT2AA to translate all sequences (i.e. to ignore this option).

You can also decide to remove the first and/or last codon if they are incomplete:

  • java -jar macse.jar -prog translateNT2AA -seq sequences.fasta -guessOneReadingFrame -trim_pending_ON

3. Selecting the genetic code for the translation

By default translations are done using The_Standard_Code but you can specify a different default genetic code for your dataset using the gc_def option. If your dataset contains sequences that use different genetic codes you can use a text file, which contains, on separate lines, both the sequence name and the number indicating the corresponding genetic code.

All information regarding the available genetic codes and the options to indicate the code to use for translating each sequence is detailed in the genetic code section.

You can also choose to view the resulting amino acid sequences in the compressed alphabets of your choice (default is SE_B_8):

  • java -jar macse.jar -prog translateNT2AA -seq sequences.fasta -use_compressed_alphabet_ON

4. Related documentation

You can find other options related to this program from the following links: