-
Hi! I am trying to generate a gene tree, based on several genes in several species, all grouped by being in broadly one gene family. Specifically, I am looking at about 30 heat shock proteins in C. elegans (hsp-16.1, hsp-16.48 etc), and all homologs that I found in a set of other species using Orthofinder (this includes paralogs within elegans, and approximately 25-40 orthologs for each other species totalling about 200 genes total). What I have tried so far is to obtain protein sequences for all genes, feed them into multiple sequence alignment (MUSCLE), then forward the output into IQTREE2 and blindly let it run, default settings. This worked fine, although a few branches looked odd and so I tried to remove an individual gene and reran the analysis just to see the result, but found lots of branches got altered. This made me worry that the basis for the tree isn't as stable as it could be. (my code, as run through the webUI: path_to_iqtree -s the_tree_+_CBG04606-CRE25952.aln-clustalw -m TEST -bb 1000 -alrt 1000) I got the idea that it might be worthwhile to guide the tree-making process by incorporating a species tree, but I have no idea how to do that. Would this be a valid method of improving my tree? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@Echodonut, I wouldn't advise this. For any reasonable alignment IQ-TREE should do a perfectly good job of estimating the ML tree without any attempts to guide towards a particular topology (e.g. a species tree). Here are some things you probably should consider (with apologies if I didn't quite understand what you were doing):
3a. Doing a number of replicates (say, 10) of each gene tree. The best tree is the one with the highest likelihood, but this will also tell you how stable the inference is. 3b. Doing bootstraps to assess uncertainty
Hope some of that helps, Rob |
Beta Was this translation helpful? Give feedback.
@Echodonut, I wouldn't advise this. For any reasonable alignment IQ-TREE should do a perfectly good job of estimating the ML tree without any attempts to guide towards a particular topology (e.g. a species tree).
Here are some things you probably should consider (with apologies if I didn't quite understand what you were doing):
Checking all of your alignments by eye (all of the inference assumes that the alignments are correct, so the better the alignments, the better the inference. Orthofinder makes plenty of mistakes, and alignment is a hard problem too...). You can then adjust the alignments by eye and/or by machine (e.g. by masking/trimming, etc). But your eyes are by far the best q…