Hello,
My first post, so I hope I'm posting this in the correct place!
I have ~100k fasta sequences - some with duplicate fasta IDs (they also have identical sequences), but with unique descriptions. I would like to extract unique fasta sequences based on ID (so, remove duplicates, but keep one representative sequence), but also append the description associated with the duplicates.
For example, my fasta file might contain the following 3 sequences:
>Contig1
ATGCGAGTAG
>Contig1 Description1
ATGCGAGTAG
>Contig1 Description2
ATGCGAGTAG
And I'm looking to obtain the following single sequence:
>Contig1 Description1 Description2
ATGCGAGTAG
Thanks for any help :)
I have been trying to use fasuniq, but this only can concatenate the IDs of duplicated sequences.
While the dedeuplication part can be achieved by different programs dedupe.sh from BBMap suite is one) if you must have the descriptions appended to the deduped sequence then that would require a specific solution.