Hi All,
I want to do phylogenetic analyses based on single-locus data sets as well as based on the combined multiple-gene dataset.
For the single-locus data set, I had found out two most suitable outgroups by BlastN against public database, however, the sequences of the two outgroup taxa are shorter than my ingroup sequences at both ends (5' end and 3' end) although there is perfect alignment between outgroups and ingroups. I don't want to trim my ingroups from both ends of the alignment because valuable informative characters are included in those regions (i.e., the regions where my ingroups have but outgroups don't). I want to know in my case, can I still use the two outgroups with some alignment gaps (or more strict, missing ) being kept at both ends of the outgroup sequences?
My another concern is when I use one outgroup taxon, no support value is shown for the ingroup clade, but when two outgroup taxa is used, there is 100% support for the ingroup clade. I want to know why is so, and do I have to use at least two outgroup taxa.
For the combined dataset, my question is also about the outgroups. Because different outgroup taxa were used for each single-locus data sets, how should I determine the outgroups for multiple-loci data set. Can I concatenate together those outgroup sequences from each single-locus data set? By doing so, I may make the artificial taxon/sequences.
Hope to have your help!
Thanks.
Yongjie
Dear Dan,
Thanks for your answer.
My ingroups are over 100 individuals belonging to the same fungal species. Most of the 7 genes I used are specific to this fungal species, and so it has become a problem for me to choose suitable outgroups. Although for 3 genes I have chosen two outgroup sequences for each gene by BlastN, for other 4 genes I cannot find a suitable outgroup. For the 3 genes that each have two outgroups, the outgroup taxa are all different among the 3 genes (for the first gene, the outgroups are species A and B; the second gene, species C and D; and the third gene, species E and F). I'm still not clear how to determine the outgroups that will be used in multi-gene phylogeny. Can I concatenate the outgroup sequences from the 3 genes anyway (I'll be actually making nonexistent taxa though) and leave the corresponding other 4 genes blank for the assumed outgroup? That is, the 7 genes of outgroup 1: A, C, E, blank, blank, blank, blank; outgroup 2: B, D, F, blank, blank, blank, blank.
Yongjie
You definitely do not want to make artificial or composite taxa. You can try doing a concatenated analysis with missing data in your outgroup taxa. I'm not sure how good the results will be though but it is probably worth a shot.