Hello,
I got some Fusarium ITS2 sequences from my metataxonomics experiment that I want to check their taxonomic assignation. In order to do that, I'm going to construct a phylogenetic tree with my sequences + a collection of Fusarium ITS2 sequences retrieved from NCBI (I'll probably go with MrBayes for the tree).
I performed the MSA with MAFFT, and now I'm looking at the alignment. I find regions like this one:
I know the typical behaviour here would be to trim these non-conserved regions using something like Gblocks or TrimAl, but I'm not sure if that would be a good thing here since:
- ITS2 are non coding, indel-rich regions, and that variability is what makes them suitable for their use as a marker gene. It is true that this variability would make impossible to obtain e.g. a family-level tree, but in this case the tree would only contain sequences from one genus (Fusarium). If I delete those sections, maybe I'm losing phylogenetic information.
- From the Phylogenetic Handbook: "Not necessarily all positions with gaps need to be discarded (often referred to as “gap stripping”) because they can still contain useful information."
My approach here would be: keep all sections like the one in the picture, and manually correct misaligned nucleotides if necessary. I don't know if anyone working with ITS (or anyone working regularly with MSA) thinks this is a sensible approach, or if I'm missing something obvious and I still need to trim. Any help or guidance here would be highly appreciated.