I have a selection of 16S sequences derived from different species, clustered into several different fasta files based on which genus the sequence came from. I would like to perform alignments on the sequences in order to probe for conserved regions for each genus. However, I have a few shorter sequences in which the full length of this sequence are contained entirely within longer ones. I just wanted to know if there are any softwares in which I can clean up these data to remove these redundant sequences before aligning them (as I currently do not have access to a huge amount of computational memory so removing any extraneous data would be of great benefit).
Any advice would be greatly appreciated!
Oh yes that is exactly what I was looking for, I knew it must have existed somewhere! Thank you