Entering edit mode
5.6 years ago
saadleeshehreen
▴
140
Hi,
I have list of 160111 protein files. Some of the files are duplication as GCA and GCF id contains same protein sequnces. How I can deduplicate the list on the basis of ASM102201v1?
Enterobacter_hormaechei-158836#GCA_001022015.1/GCA_001022015.1_ASM102201v1_protein.faa
Enterobacter_cloacae-550#GCF_001022015.1/GCF_001022015.1_ASM102201v1_protein.faa