Genetic codes and translation

1. Selecting the adapted genetic code(s)

Folder: samples/delegations/aminos/gc_def/

MACSE is able to handle the following genetic codes:

  • 1 The_Standard_Code
  • 2 The_Vertebrate_Mitochondrial_Code
  • 3 The_Yeast_Mitochondrial_Code
  • 4 The_Mold_Protozoan_and_Coelenterate_Mitochondrial_Code_and_the_Mycoplasma_Spiroplasma_Code
  • 5 The_Invertebrate_Mitochondrial_Code
  • 6 The_Ciliate_Dasycladacean_and_Hexamita_Nuclear_Code
  • 9 The_Echinoderm_and_Flatworm_Mitochondrial_Code
  • 10 The_Euplotid_Nuclear_Code
  • 11 The_Bacterial_Archaeal_and_Plant_Plastid_Code
  • 12 The_Alternative_Yeast_Nuclear_Code
  • 13 The_Ascidian_Mitochondrial_Code
  • 14 The_Alternative_Flatworm_Mitochondrial_Code
  • 15 Blepharisma_Nuclear_Code
  • 16 Chlorophycean_Mitochondrial_Code
  • 21 Trematode_Mitochondrial_Code
  • 22 Scenedesmus_obliquus_mitochondrial_Code
  • 23 Thraustochytrium_Mitochondrial_Code
  • 24 Rhabdopleuridae_Mitochondrial_Code
  • 25 Candidate_Division_SR1_and_Gracilibacteria
  • 26 Pachysolen_tannophilus_Nuclear_Code
  • 27 Karyorelict_Nuclear_Code
  • 28 Condylostoma_Nuclear_Code
  • 29 Mesodinium_Nuclear_Code
  • 30 Peritrich_Nuclear_Code
  • 31 Blastocrithidia_Nuclear_Code
  • 32 Seleno_Protein_Code
  • 33 Cephalodiscidae_Mitochondrial_UAA-Tyr_Code

By default MACSE uses The_Standard_Code but you can specify a different default genetic code for your dataset using the gc_def option, specifying the number of the genetic code (listed above):

  • java -jar macse.jar -prog translateNT2AA -seq sequences.fasta -gc_def 9

For instance, to align grasshoper COI genes and pseudogenes (data from Song et al 2008), the correct genetic code is the invertebrate mitochondrial one (5):

  • java -jar macse.jar -prog alignSequences -seq Song_PNAS2008_Grasshoppers_COX1_genes.fasta -seq_lr Song_PNAS2008_Grasshoppers_COX1_pseudo.fasta -gc_def 5 -stop_lr 30 -fs_lr 30

If your dataset contains sequences that use different genetic codes you can use a text file, which on each line contains both the sequence name and the number of the corresponding genetic code (you can use any of the following field separator: space, tabulation, comma, semicolon):

  • java -jar macse.jar -prog translateNT2AA -seq sequences.fasta -gc_file riboSequences.txt

Any sequence absent from this file will be translated using default genetic code (see gc_def option).

2. Translation of codon with ambiguous nucleotides

Folder: samples/delegations/aminos/ambi/

When a codon contains a single ambiguous nucleotide (e.g. N, R, Y) and all its possible translation lead to the same amino acid, MACSE uses this unambiguous translation despite the nucleotide ambiguity. You can disable this functionality, to translate any codon containing an ambiguous nucleotide into the unknown amino acid (X) using the ambi_OFF option. In this case, with the default genetic code, the codon TCN will be translated into the unknown amino acid (X) instead of being translated into serine (S).

  • java -jar macse.jar -prog translateNT2AA -seq sequences.fasta -ambi_OFF