need to add unique ids with accession number in multiple fasta refseq files
0
0
Entering edit mode
2.7 years ago
Priya ▴ 20

i need to add my unique ids (that i have created) to accession numbers in fasta files. the unique set of ids are given in a csv file with column1 having unique ids, column2 having fasta file names and column3 having those accession numbers that i need to attach unique ids with. this file looks like this:

0001:GCF_000009885.1_ASM988v1_protein.faa:WP_000014594.1
0001:GCF_000009885.1_ASM988v1_protein.faa:WP_000025662.1
0001:GCF_000009885.1_ASM988v1_protein.faa:WP_000079398.1 
so on ...............
1027:GCF_920103885.1_DJ_protein.faa:WP_230633546.1
1027:GCF_920103885.1_DJ_protein.faa:WP_230633547.1
1027:GCF_920103885.1_DJ_protein.faa:WP_230633548.1
so on............

i have a directory with all my fasta files, and i need to open those files and add unique ids from above table just after accession numbers with an # . for example if a fasta file (GCF_000009885.1_ASM988v1_protein.faa) looks like this-

`>WP_000014594.1 MULTISPECIES: RNA chaperone/antiterminator CspA [Bacteria]
MSGKMTGIVKWFNADKGFGFITPDDGSKDVFVHFSAIQNDGYKSLDEGQKVSFTIESGAKGPAAGNVTSL
>WP_000025662.1 MULTISPECIES: copper resistance system metallochaperone PcoC [Bacteria]
MSILNKAILTGGLVMGVAFSAMAHPELKSSVPQADSAVAAPEKIQLNFSENLTVKFSGAKLTMTGMKGMSSHSPMPVAAK
VAPGADPKSMVIIPREPLPAGTYRVDWRAVSSDTHPITGNYTFTVK
>WP_000079398.1MULTISPECIES: sugar ABC transporter permease [Enterobacteriaceae]
MAQSPSIKREKWIRLSLTWLVVILVSVVIIYPLVWTVGASLNAGNSLLSTSIIPENVSFQHYADLFNGNVNYLTWYWNSM
KISFLTMVLTLISVSFTAYAFSRFRFKGRQNGLMLFLLLQMIPQFSALIAIFVLSQLLGLINSHLALVLIYVGGMIPMNT

I need to edit all fasta files like this-

>WP_000014594.1#0001 MULTISPECIES: RNA chaperone/antiterminator CspA [Bacteria]
MSGKMTGIVKWFNADKGFGFITPDDGSKDVFVHFSAIQNDGYKSLDEGQKVSFTIESGAKGPAAGNVTSL
>WP_000025662.1#0001 MULTISPECIES: copper resistance system metallochaperone PcoC [Bacteria]
MSILNKAILTGGLVMGVAFSAMAHPELKSSVPQADSAVAAPEKIQLNFSENLTVKFSGAKLTMTGMKGMSSHSPMPVAAK
VAPGADPKSMVIIPREPLPAGTYRVDWRAVSSDTHPITGNYTFTVK
>WP_000079398.1#0001 MULTISPECIES: sugar ABC transporter permease [Enterobacteriaceae]
MAQSPSIKREKWIRLSLTWLVVILVSVVIIYPLVWTVGASLNAGNSLLSTSIIPENVSFQHYADLFNGNVNYLTWYWNSM
KISFLTMVLTLISVSFTAYAFSRFRFKGRQNGLMLFLLLQMIPQFSALIAIFVLSQLLGLINSHLALVLIYVGGMIPMNT

and so on......

any script could work like python or perl. thanks!

python • 762 views
ADD COMMENT
2
Entering edit mode

It appears to me that you are coming up with all kinds of obscure FASTA header modifications, and not once have you tried to solve the problem yourself by using similar solutions others have provided for you. I suggest you show some effort rather than expecting others to solve all of these problems. It seems that with small modifications of your existing solutions you should be able to do it. Don't you want to fish on your own rather than waiting for someone to hand you the fish?

ADD REPLY
0
Entering edit mode

thnks i will try that

ADD REPLY

Login before adding your answer.

Traffic: 2045 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6