i need to add my unique ids (that i have created) to accession numbers in fasta files. the unique set of ids are given in a csv file with column1 having unique ids, column2 having fasta file names and column3 having those accession numbers that i need to attach unique ids with. this file looks like this:
0001:GCF_000009885.1_ASM988v1_protein.faa:WP_000014594.1
0001:GCF_000009885.1_ASM988v1_protein.faa:WP_000025662.1
0001:GCF_000009885.1_ASM988v1_protein.faa:WP_000079398.1
so on ...............
1027:GCF_920103885.1_DJ_protein.faa:WP_230633546.1
1027:GCF_920103885.1_DJ_protein.faa:WP_230633547.1
1027:GCF_920103885.1_DJ_protein.faa:WP_230633548.1
so on............
i have a directory with all my fasta files, and i need to open those files and add unique ids from above table just after accession numbers with an # . for example if a fasta file (GCF_000009885.1_ASM988v1_protein.faa) looks like this-
`>WP_000014594.1 MULTISPECIES: RNA chaperone/antiterminator CspA [Bacteria]
MSGKMTGIVKWFNADKGFGFITPDDGSKDVFVHFSAIQNDGYKSLDEGQKVSFTIESGAKGPAAGNVTSL
>WP_000025662.1 MULTISPECIES: copper resistance system metallochaperone PcoC [Bacteria]
MSILNKAILTGGLVMGVAFSAMAHPELKSSVPQADSAVAAPEKIQLNFSENLTVKFSGAKLTMTGMKGMSSHSPMPVAAK
VAPGADPKSMVIIPREPLPAGTYRVDWRAVSSDTHPITGNYTFTVK
>WP_000079398.1MULTISPECIES: sugar ABC transporter permease [Enterobacteriaceae]
MAQSPSIKREKWIRLSLTWLVVILVSVVIIYPLVWTVGASLNAGNSLLSTSIIPENVSFQHYADLFNGNVNYLTWYWNSM
KISFLTMVLTLISVSFTAYAFSRFRFKGRQNGLMLFLLLQMIPQFSALIAIFVLSQLLGLINSHLALVLIYVGGMIPMNT
I need to edit all fasta files like this-
>WP_000014594.1#0001 MULTISPECIES: RNA chaperone/antiterminator CspA [Bacteria]
MSGKMTGIVKWFNADKGFGFITPDDGSKDVFVHFSAIQNDGYKSLDEGQKVSFTIESGAKGPAAGNVTSL
>WP_000025662.1#0001 MULTISPECIES: copper resistance system metallochaperone PcoC [Bacteria]
MSILNKAILTGGLVMGVAFSAMAHPELKSSVPQADSAVAAPEKIQLNFSENLTVKFSGAKLTMTGMKGMSSHSPMPVAAK
VAPGADPKSMVIIPREPLPAGTYRVDWRAVSSDTHPITGNYTFTVK
>WP_000079398.1#0001 MULTISPECIES: sugar ABC transporter permease [Enterobacteriaceae]
MAQSPSIKREKWIRLSLTWLVVILVSVVIIYPLVWTVGASLNAGNSLLSTSIIPENVSFQHYADLFNGNVNYLTWYWNSM
KISFLTMVLTLISVSFTAYAFSRFRFKGRQNGLMLFLLLQMIPQFSALIAIFVLSQLLGLINSHLALVLIYVGGMIPMNT
and so on......
any script could work like python or perl. thanks!
It appears to me that you are coming up with all kinds of obscure FASTA header modifications, and not once have you tried to solve the problem yourself by using similar solutions others have provided for you. I suggest you show some effort rather than expecting others to solve all of these problems. It seems that with small modifications of your existing solutions you should be able to do it. Don't you want to fish on your own rather than waiting for someone to hand you the fish?
linearize, sort, join and reformat with awk: How to match fasta header with the first column of a text file and append the second column of the text file to the end of the fasta header ; Renaming fasta file according to a name list (blast output)
thnks i will try that