Skip to content

[phylogenetic] B.1 tree fails due to "export: has duplicate nodes or "treetime: nodes with name None" #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
corneliusroemer opened this issue Apr 23, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@corneliusroemer
Copy link
Member

corneliusroemer commented Apr 23, 2025

B.1 phylogenetic runs have failed 3 times in a row. 2 times because of treetime error, once because of export validation error. (@joverlee521 seems to have already started investigating, at least triggering a rerun yesterday)

Treetime error:

[batch] [2025-04-18T18:47:41+00:00] augur refine is using TreeTime version 0.11.4
[batch] [2025-04-18T18:47:41+00:00] ERROR: TreeAnc.optimal_branch_length: terminal node alignments required; sequence is missing for leaf: 'None'. Missing terminal sequences can be inferred from sister nodes by rerunning with `reconstruct_tip_states=True` or `--reconstruct-tip-states`
[batch] [2025-04-18T18:47:41+00:00] 380.89	***WARNING: TreeAnc._check_alignment_tree_gtr_consistency: NO SEQUENCE
[batch] [2025-04-18T18:47:41+00:00]       	FOR LEAF: 'None'
[batch] [2025-04-18T18:47:41+00:00] 380.90	***WARNING: TreeAnc: 1 nodes don't have a matching sequence in the
[batch] [2025-04-18T18:47:41+00:00]       	alignment. POSSIBLE ERROR.

https://github.com/nextstrain/mpox/actions/runs/14538944332/job/40792976872#step:5:676

Export error:

[batch] [2025-04-22T18:38:51+00:00] Validating that the JSON is internally consistent...
[batch] [2025-04-22T18:38:51+00:00] Node OP536796 appears multiple times in the tree.

https://github.com/nextstrain/mpox/actions/runs/14538944332/job/40954094434#step:5:763

Root cause is quite likely a bug in our tree-fix post-iqtree script.

Two possible solutions:

  • find and fix the bug in that script
  • switch B.1 build to cmaple (+ collapse polytomy post-process script) instead of iqtree + fix-tree.
@corneliusroemer corneliusroemer added the bug Something isn't working label Apr 23, 2025
@joverlee521
Copy link
Contributor

The fix_tree bug seems to be stochastic, so I've just been re-running the builds when they fail.

It'd be great if the cmaple route works, do you have an example of using cmaple + collapse polytomy script?

@corneliusroemer
Copy link
Member Author

Even if the bug is stochastic, it's still most likely a logic error in the tree-fix script. If I'll see if I can reproduce and debug it using the build outputs.

Re cmaple, it should work for B.1, though it might be suboptimal for anything with branches that are longer than a few mutations unfortunately (see iqtree/cmaple#46)

Here's what I used in some experiments:

if TREE_METHOD == "cmaple":

    rule tree:
        input:
            alignment="results/{build_name}/masked.fasta",
        output:
            tree="results/{build_name}/masked.fasta.treefile",
        shell:
            """
            cmaple \
                -aln {input.alignment} \
                -st DNA \
                --search EXHAUSTIVE \
                --out-mul-tree \
                --make-consistent \
                --overwrite
            """

    rule collapse:
        input:
            script="scripts/collapse-zero-branches.py",
            tree=rules.tree.output.tree,
        output:
            tree="results/{build_name}/tree.nwk",
        shell:
            """
            python {input.script} \
                --threshold 0.0000001 \
                --verbose \
                --input-tree {input.tree} \
                --output-tree {output.tree}
            """

else:
    rule tree:
        input:
            alignment="results/{build_name}/masked.fasta",
        output:
            tree="results/{build_name}/tree.nwk",
        shell:
            """
            augur tree \
                --alignment {input.alignment} \
                --tree-builder-args "--polytomy --ninit 2 -n 2 --epsilon 0.05 -T 4 --redo" \
                --nthreads 4 \
                --output {output.tree}
            """

where the collapse script is script is:

import argparse
import sys
from collections import Counter

from Bio import Phylo


def get_branch_length_distribution(tree) -> Counter[float, int]:
    return Counter(node.branch_length for node in tree.find_clades() if node.branch_length is not None)


def collapse_near_zero_branches(tree, threshold=0.001, verbose=False):
    """
    Collapses internal branches with lengths below the specified threshold.
    Args:
    tree (Bio.Phylo.BaseTree.Tree): Phylogenetic tree.
    threshold (float): Length threshold to consider for collapsing.
    verbose (bool): Print statistics if True.
    """
    pre_collapse_lengths = set()
    for node in tree.find_clades():
        if node.branch_length is not None:
            pre_collapse_lengths.add(node.branch_length)

    branch_length_counts_before = get_branch_length_distribution(tree)
    tree.collapse_all(lambda c: c.branch_length < threshold)
    branch_length_counts_after = get_branch_length_distribution(tree)

    # Print statistics of which branches were collapsed
    # Calculate the difference in the number of internal branches before and after collapsing
    difference = branch_length_counts_before - branch_length_counts_after

    if verbose:
        print(f"Collapsed {difference.total()} internal branches with lengths below {threshold}")
        print("Collapsed branches:")
        for length, count in difference.items():
            print(f"Branch length {length}: {count} branches")


def main(args):
    # Load a Newick tree from file
    tree = Phylo.read(args.input_tree, "newick")

    # Collapse near-zero internal branches using the provided threshold
    collapse_near_zero_branches(tree, threshold=args.threshold, verbose=args.verbose)

    # Output the resulting tree
    if args.output_tree:
        Phylo.write(tree, args.output_tree, "newick", format_branch_length="%1.8f")
        if args.verbose:
            print(f"Output tree written to {args.output_tree}")
    else:
        Phylo.write(tree, sys.stdout, "newick")


if __name__ == "__main__":
    # Setup command line argument parsing
    parser = argparse.ArgumentParser(
        description="Process a Newick tree to collapse near-zero internal branches."
    )
    parser.add_argument(
        "--threshold",
        type=float,
        default=1.0e-7,
        help="Threshold for collapsing branches (default: 1.0e-7)",
    )
    parser.add_argument(
        "--input-tree",
        type=str,
        required=True,
        help="Path to the input Newick tree file",
    )
    parser.add_argument(
        "--output-tree",
        type=str,
        help="Path to the output Newick tree file (optional, defaults to stdout)",
    )
    parser.add_argument(
        "--verbose",
        action="store_true",
        help="Enable verbose output for more information",
    )

    args = parser.parse_args()
    main(args)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants