Efficiently identifying nodes in a list of marginal trees (indexed from across the genome) #3232
-
I have a trees sequence and am interested in a particular subsection of trees (that have been predefined & index wrt some property). I am wondering if there is a way of efficiently identifying which nodes are in this subset of trees? (preferably more efficient than; for tree in trees: for node in tree.nodes: nodelist.add(node.index), which is extremely redundant, as most nodes are shared between trees). EDIT: purpose was to identify nodes found in arbitrary trees sprinkled across the tree sequence according to some criterion that filters trees. The broader context is my exploration of 'tree masking' - i.e. removing trees that fail some criteria (which may vary) when performing certain kind of tree calculations. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
# warning: untested
new_ts = ts.keep_intervals([[1e4, 2e4]], simplify=False)
nodes_in_interval=np.unique(np.concatenate((new_ts.edges_parent, new_ts.edges_child, new_ts.samples()))) |
Beta Was this translation helpful? Give feedback.
Re the edge_diffs approach, the general idea of using a node mask and adding / subtracting from that is probably right.
Have you tried using the fast Tree array accessors. That might also be a good option?
Simply:
It could well be fast enough?