MACSE

Genetic codes and translation

1. Selecting the adapted genetic code(s)

Folder: samples/delegations/aminos/gc_def/

MACSE is able to handle the following genetic codes:

1 The_Standard_Code
2 The_Vertebrate_Mitochondrial_Code
3 The_Yeast_Mitochondrial_Code
4 The_Mold_Protozoan_and_Coelenterate_Mitochondrial_Code_and_the_Mycoplasma_Spiroplasma_Code
5 The_Invertebrate_Mitochondrial_Code
6 The_Ciliate_Dasycladacean_and_Hexamita_Nuclear_Code
9 The_Echinoderm_and_Flatworm_Mitochondrial_Code
10 The_Euplotid_Nuclear_Code
11 The_Bacterial_Archaeal_and_Plant_Plastid_Code
12 The_Alternative_Yeast_Nuclear_Code
13 The_Ascidian_Mitochondrial_Code
14 The_Alternative_Flatworm_Mitochondrial_Code
15 Blepharisma_Nuclear_Code
16 Chlorophycean_Mitochondrial_Code
21 Trematode_Mitochondrial_Code
22 Scenedesmus_obliquus_mitochondrial_Code
23 Thraustochytrium_Mitochondrial_Code
24 Rhabdopleuridae_Mitochondrial_Code
25 Candidate_Division_SR1_and_Gracilibacteria
26 Pachysolen_tannophilus_Nuclear_Code
27 Karyorelict_Nuclear_Code
28 Condylostoma_Nuclear_Code
29 Mesodinium_Nuclear_Code
30 Peritrich_Nuclear_Code
31 Blastocrithidia_Nuclear_Code
32 Seleno_Protein_Code
33 Cephalodiscidae_Mitochondrial_UAA-Tyr_Code

By default MACSE uses The_Standard_Code but you can specify a different default genetic code for your dataset using the gc_def option, specifying the number of the genetic code (listed above):

java -jar macse.jar -prog translateNT2AA -seq sequences.fasta -gc_def 9

For instance, to align grasshoper COI genes and pseudogenes (data from Song et al 2008), the correct genetic code is the invertebrate mitochondrial one (5):

java -jar macse.jar -prog alignSequences -seq Song_PNAS2008_Grasshoppers_COX1_genes.fasta -seq_lr Song_PNAS2008_Grasshoppers_COX1_pseudo.fasta -gc_def 5 -stop_lr 30 -fs_lr 30

If your dataset contains sequences that use different genetic codes you can use a text file, which on each line contains both the sequence name and the number of the corresponding genetic code (you can use any of the following field separator: space, tabulation, comma, semicolon):

java -jar macse.jar -prog translateNT2AA -seq sequences.fasta -gc_file riboSequences.txt

Any sequence absent from this file will be translated using default genetic code (see gc_def option).

2. Translation of codon with ambiguous nucleotides

Folder: samples/delegations/aminos/ambi/

When a codon contains a single ambiguous nucleotide (e.g. N, R, Y) and all its possible translation lead to the same amino acid, MACSE uses this unambiguous translation despite the nucleotide ambiguity. You can disable this functionality, to translate any codon containing an ambiguous nucleotide into the unknown amino acid (X) using the ambi_OFF option. In this case, with the default genetic code, the codon TCN will be translated into the unknown amino acid (X) instead of being translated into serine (S).

java -jar macse.jar -prog translateNT2AA -seq sequences.fasta -ambi_OFF