i want to replace accession number with gene name?
1
0
Entering edit mode
19 months ago
Neel ▴ 20

Hi i have fasta file of sequence where i want to replace CP077971.1_494> MexR respectively in that fasta file for all the genes were there, Is there anyway to do that?

  >CP077971.1_494 P_aeruginosa_ZPPH33resistance-nodulation-cell division (RND) antibiotic efflux pumpMexR
MNYPVNPDLMPALMAVFQHVRTRIQSELDCQRLDLTPPDVHVLKLIDEQRGLNLQDLGRQMCRDKALITRKIRELEGRNLVRRERNPSDQRSFQLFLTDEGLAIHQHAEAIMSRVHDELFAPLTPVEQATLVHLLDQCLAAQPLEDI
    >CP077971.1_1265 P_aeruginosa_ZPPH33resistance-nodulation-cell division (RND) antibiotic efflux pumpnalC
    MNDASPRLTERGRQRRRAMLDAATQAFLEHGFEGTTLDMVIERAGGSRGTLYSSFGGKEGLFAAVIAHMIEEIFDDSADQPRPAATLSATLEHFGRRFLTSLLDPRCQSLYRLVVAESPRFPAIGKSFYEQGPQQSYLLLSERLAAVAPHMDEETLYAVACQFLEMLKADLFLKALSVADFQPTMALLETRLKLSVDIIACYLEHLSQSPAQG
    >CP077971.1_1470 P_aeruginosa_ZPPH33resistance-nodulation-cell division (RND) antibiotic efflux pumpnalD
    MRRTKEDSEKTRTAILLAAEELFLEKGVSHTSLEQIARAAGVTRGAVYWHFQNKAHLFNEMLNQVRLPPEQLTERLSGCDGSDPLRSLYDLCLEAVQSLLTQEKKRRILTILMQRCEFTEELREAQERNNAFVQMFIELCEQLFARDECRVRLHPGMTPRIASRALHALILGLFNDWLRDPRLFDPDTDAEHLLEPMFRGLVRDWGQASSAP
    >CP077971.1_2660 P_aeruginosa_ZPPH33resistance-nodulation-cell division (RND) antibiotic efflux pumpMexT
    MNRNDLRRVDLNLLIVFETLMHERSVTRAAEKLFLGQPAISAALSRLRTLFDDPLFVRTGRSMEPTARAQEIFAHLSPALDSISTAMSRASEFDPATSTAVFRIGLSDDVEFGLLPPLLRRLRAEAPGIVLVVRRANYLLMPNLLASGEISVGVSYTDELPANAKRKTVRRSKPKILRADSAPGQLTLDDYCARPHALVSFAGDLSGFVDEELEKFGRKRKVVLAVPQFNGLGTLLAGTDIIATVPDYAAQALIAAGGLRAEDPPFETRAFELSMAWRGAQDNDPAERWLRSRISMFIGDPDSL

Thank you!

fasta • 795 views
ADD COMMENT
1
Entering edit mode
19 months ago

You can use seqkit replace

seqkit replace -p "^(\S+)(.*\s)(\S+)$" -r "\$3\$2\$3" input.fasta

>pumpMexR P_aeruginosa_ZPPH33resistance-nodulation-cell division (RND) antibiotic efflux pumpMexR
MNYPVNPDLMPALMAVFQHVRTRIQS
>pumpnalC P_aeruginosa_ZPPH33resistance-nodulation-cell division (RND) antibiotic efflux pumpnalC
MNDASPRLTERGRQRRRAMLDAATQA

If it's always 'pump' before the gene name you can further clean it up.

seqkit replace -p "^(\S+)(.*\spump)(\S+)$" -r "\$3\$2" input.fasta

>MexR P_aeruginosa_ZPPH33resistance-nodulation-cell division (RND) antibiotic efflux pump
MNYPVNPDLMPALMAVFQHVRTRIQS
>nalC P_aeruginosa_ZPPH33resistance-nodulation-cell division (RND) antibiotic efflux pump
MNDASPRLTERGRQRRRAMLDAATQA

If it's not always pump you'll need to create a tab delimited key-value file such as.

CP077971.1_494  MexA
CP077971.1_1265 nalC

And use that key-value file (which I will call replacements.tsv below) for replacement.

seqkit replace -p "^(\S+)(.*)" -r "{kv}\$2" -Kk replacements.tsv input.fasta

>MexA P_aeruginosa_ZPPH33resistance-nodulation-cell division (RND) antibiotic efflux pumpMexR
MNYPVNPDLMPALMAVFQHVRTRIQS
>nalC P_aeruginosa_ZPPH33resistance-nodulation-cell division (RND) antibiotic efflux pumpnalC
MNDASPRLTERGRQRRRAMLDAATQA
ADD COMMENT
1
Entering edit mode

Thank you Sir for your reply. Now its done.

ADD REPLY

Login before adding your answer.

Traffic: 2661 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6