SuperTriplets

SuperTriplets installation and fundamental usage

1. Program installation

SuperTriplets is really easy to install. Just download the jar file of the latest release of SuperTriplets (e.g. SuperTriplets_v1.1.jar) on your computer. Note that JAVA (JRE above 1.5) should be installed on your computer (download JAVA ).

SuperTriplets is command line only so if you double click on the jar file nothing seems to happen. To execute it you need a console to type the command line

To use SuperTriplets you first need to open a console.

  • Mac OS X: Double-click on the terminal program that is in the Application/Utilities folder.
  • WINDOWS: Get the windows invite by typing Windows + r. A small dialog windows should appear, then type cmd and click OK.
  • LINUX: if you are using Linux OS you surely know how to get a console.

Once you get a console, assuming that you got java installed and that you are in the folder containing the SuperTriplets jar file, you can get help using:

  • java -jar SuperTriplets_v1.1.jar

2. Using SuperTriplets

SuperTriplets take two parameters:

  • the first is the name of the file containing the set of rooted trees (in newick format) you want to summarize into your supertree.
  • the second is the name of the output file that will contain the inferred supertree (in newick format).

A typical SuperTriplets command is hence:

  • java -jar SuperTriplets_v1.1.jar inputForest outputSupertreeFile

Note that the memory needed to store triplet information growth rapidly with the number of distinct taxa appearing in the forest. Hence, for large dataset, you may have to provide some extra memory to the java machine using the Xmx option

  • java -jar -Xmx600m SuperTriplets_v1.1.jar inputForest outputSupertreeFile

3. Understanding SuperTriplet output

SuperTriplet decomposes each input tree into a large set of tiny 3 leaf trees called triplets. Supertriplet then search for the median tree of your input trees using a triplet based distance. The reliability of each branch of the rooted ouput tree is defined based on the percentage of triplets of the input trees in agreement/disagreement with this branch. Finnally branches with low support are collapsed and the resulting rooted supertree is saved.

As some users may be interested in the intermediate binary supertree (even if it has low support clades) we print it in the standard output (but we do not saved it in the output file as we advise again using it as supertree and overinterpreting unsupported clades). As a matter of fact, two lines are written to the standard output the first one concern the supertree with all clades (critNNI@value_of_the_objective_function @newick_supertree) and the second one correspond to the collapsed version of this supertree (critSNNI@value_of_the_objective_function@newick_filtered_supertree). For most users these two printed lines are irrelevant and only the final newick supertree which is saved in the output file will be useful.

4. Frequent problems/errors when first using SuperTriplets

The program crashes due to insufficient memory

  • Check that all species have the exact same name in all your input trees. If you got 100 taxon and 1000 trees, supertriplets should build a supertree of 100 leaves. However if your 1000 trees used different names for an identical taxon (e.g. in the first tree mus is called mus_gene1, in the second it is called ENSMUSG00000020687 etc.) then SuperTriples will try to build a supertree with 100,000 leaves and will probably crash
  • Assign more memory to java using Xmx option. The memory needed to store triplet information growth rapidly with the number of distinct taxa appearing in the forest (while the number of trees has little impact). Hence, for dataset including numerous taxa, you may have to provide some extra memory to the java machine using the Xmx option

    java -jar -Xmx600m SuperTriplets_v1.1.jar inputForest outputSupertreeFile

The output tree is (almost) completely unresolved

  • Check that your input trees are correctly rooted. The rooting is also more robust if your trees contain several outgroup taxa, if you got a single outgroup sequence and this sequence is incorrect (paralogy, misalignment etc.) then the rooting will be erroneous and this error will be hard to detect. Whereas if you got a couple of outgroup sequences you could check that they are grouped together.