I got several thousand sequences from blastp search. So I removed the sequences with >90% identity by cd-hit before MSA and also did the same after MSA construction. The assumption was that the sequences with >90% identity will end up in closly related branches. I am wondering if this cutoff makes sense.
Probably more justified way of reducing the number of sequences would be to build a distanced-based tree (NJ, UPGMA) first for the whole set of sequences. And then you could use Dendroscope3 or iTol programs to auto collapse clades containing very closely-related sequences. During this auto-collapsing, the average branch length to all leaves is calculated for all internal nodes, and those clades where this value is below your threshold are collapsed. You can also specify your own support value or a certain node length.