Dear all, there are many posts about remove duplicate sequences in a fasta file (https://www.biostars.org/p/3003/), but I want to remove only the duplicate sequences with the same ids.
I have many duplicate sequences in my fasta file, but with different ids and I want to keep them.
How to remove only same id sequence duplicates? I have protein sequences and my sequences are split in different lines.
BBMap's Dedupe utility has a "requirematchingnames" flag. This will make it only remove duplicates that have identical sequence and identical names. For example:
One copy of each duplicate set will remain, unless you add the "uniqueonly" flag.