-
Notifications
You must be signed in to change notification settings - Fork 6
Outputs
AleRax creates an output directory into which all output files are saved. This page describes the content of this output directory.
AleRax outputs the results of the species tree inference step (if relevant) into the folder species_trees
. This folder contains:
-
starting_species_tree.newick
: the starting species tree, before the species tree inference step (usually irrelevant). -
inferred_species_tree.newick
: the inferred species tree, after the species tree inference step. The reconciliation files use the (internal and terminal) labels of this tree to map the gene events to the nodes of the species tree. -
species_tree_support_kh.newick
: the inferred species tree with Kishino-Hasegawa support values between 0 and 1000. Be aware that those support values overestimate standard bootstrap support values. -
species_tree_root_rell
: the inferred species tree, with for each branch, the number of bootstrapped sets of gene families (out of 1000) that support this branch as the species root.
The reconciliations
directory contains:
- The directory
reconciliations/all
, with information about specific samples of gene families:-
FAM_SAMPLE.xml
is one reconciled gene tree in RecPhyloXLM format, that can be visualized with ThirdKind. -
FAM_eventcount_SAMPLE.txt
is the count of each type of event -
FAM.newick
all sampled gene trees, with one sample in newick format per line. Branch lengths are in units of average number of substitutions per site. -
FAM_perspecieseventcount_SAMPLE
: number of events per species for this sample. "presence" is1
if there is at least one copy and0
else. -
FAM_transfers_SAMPLE.txt
is the list of pairs of species involved in a transfer, and the number of times they were involved
-
- The directiory
reconciliations/summary
, with a summary of all sampled reconciliations of a gene family:-
FAM_perspecies_eventcount.txt
: the average over all samples of the values inFAM_perspecieseventcount_SAMPLE
-
FAM_transfers.txt
: the average over all samples of the values inFAM_transfers_SAMPLE.txt
-
FAM__consensus_50.newick
: the majority rule consensus tree of all sampled trees with support values on the branches (as labels)
-
- The directory
reconciliations/origins
, with for each species branch, a fileSPECIES.txt
, with the number of genes (average over all sampled, summed over all families) coming from:- vertical inheritance from the parent species, and the gene survived in the sister lineage (first line)
- vertical inheritance from the parent species, and the gene did not survive in the sister lineage (second line)
- horizontal gene transfer from the species indicated at the beginning of the line (all remaining lines)
-
perspecies_eventcount.txt
the sum over all families ofFAM_perspecies_eventcount.txt
-
transfers.txt
the sum over all families ofFAM_transfers.txt
The model parameters (duplication, loss, transfer, and origination probabilities) are located under the directory model_parameters
. AleRax generates one file per family if the parameters are estimated for each family separately, and one single file otherwise. A parameter file contains one line per species branch, with its corresponding D,L,T, and O probabilities (in this order, and without T if UndatedDL model is used instead of UndatedDTL, and without O if origination probabilities are not estimated).
Note that the D,L,T values are not strictly probabilities. To obtain probabilities, you have to normalize with (1.0 + D + L + T), 1.0 being the speciation parameter. The origination probabilities are already normalized (their sum over the species is 1.0).
If you enable transfer highway inference, AleRax with generate a directory highways
with:
TODO
-
per_fam_likelihoods.txt
: the likelihood of each gene family -
ccps/
: AleRax stores the CCPs in a custom binary format there -
ccpdim.txt
: The list of all family CCP sizes (useful to see which families cost a lot of memory/runtime)