Skip to content

Outputs

BenoitMorel edited this page Feb 26, 2024 · 7 revisions

AleRax creates an output directory into which all output files are saved. This page describes the content of this output directory.

Species tree inference

AleRax outputs the results of the species tree inference step (if relevant) into the folder species_trees. This folder contains:

  • starting_species_tree.newick: the starting species tree, before the species tree inference step (usually irrelevant).
  • inferred_species_tree.newick: the inferred species tree, after the species tree inference step. The reconciliation files use the (internal and terminal) labels of this tree to map the gene events to the nodes of the species tree.
  • species_tree_support_kh.newick: the inferred species tree with Kishino-Hasegawa support values between 0 and 1000. Be aware that those support values overestimate standard bootstrap support values.
  • species_tree_root_rell: the inferred species tree, with for each branch, the number of bootstrapped sets of gene families (out of 1000) that support this branch as the species root.

Reconciliations

The reconciliations directory contains:

  • The directory reconciliations/all, with information about specific samples of gene families:
    • FAM_SAMPLE.xml is one reconciled gene tree in RecPhyloXLM format, that can be visualized with ThirdKind.
    • FAM_eventcount_SAMPLE.txt is the count of each type of event
    • FAM.newick all sampled gene trees, with one sample in newick format per line. Branch lengths are in units of average number of substitutions per site.
    • FAM_perspecieseventcount_SAMPLE: number of events per species for this sample. "presence" is 1 if there is at least one copy and 0 else.
    • FAM_transfers_SAMPLE.txt is the list of pairs of species involved in a transfer, and the number of times they were involved
  • The directiory reconciliations/summary, with a summary of all sampled reconciliations of a gene family:
    • FAM_perspecies_eventcount.txt: the average over all samples of the values in FAM_perspecieseventcount_SAMPLE
    • FAM_transfers.txt: the average over all samples of the values in FAM_transfers_SAMPLE.txt
    • FAM__consensus_50.newick: the majority rule consensus tree of all sampled trees with support values on the branches (as labels)
  • The directory reconciliations/origins, with for each species branch, a file SPECIES.txt, with the number of genes (average over all sampled, summed over all families) coming from:
    • vertical inheritance from the parent species, and the gene survived in the sister lineage (first line)
    • vertical inheritance from the parent species, and the gene did not survive in the sister lineage (second line)
    • horizontal gene transfer from the species indicated at the beginning of the line (all remaining lines)
  • perspecies_eventcount.txt the sum over all families of FAM_perspecies_eventcount.txt
  • transfers.txt the sum over all families of FAM_transfers.txt

Model parameters

The model parameters (duplication, loss, transfer, and origination probabilities) are located under the directory model_parameters. AleRax generates one file per family if the parameters are estimated for each family separately, and one single file otherwise. A parameter file contains one line per species branch, with its corresponding D,L,T, and O probabilities (in this order, and without T if UndatedDL model is used instead of UndatedDTL, and without O if origination probabilities are not estimated). Note that the D,L,T values are not strictly probabilities. To obtain probabilities, you have to normalize with (1.0 + D + L + T), 1.0 being the speciation parameter. The origination probabilities are already normalized (their sum over the species is 1.0).

Transfer highways

If you enable transfer highway inference, AleRax with generate a directory highways with: TODO

miscellaneous

  • per_fam_likelihoods.txt: the likelihood of each gene family
  • ccps/: AleRax stores the CCPs in a custom binary format there
  • ccpdim.txt: The list of all family CCP sizes (useful to see which families cost a lot of memory/runtime)
Clone this wiki locally