How to replace Accession Number for GI in a fasta header
1
0
Entering edit mode
9.0 years ago

Hi everyone!

I need help with something. I am very new to bioinformatics.

I have a fasta file with 32K reference sequences for an X gene. The headers are the Accession numbers, but I need to change them for the GI.

I already have a txt file with the GI corresponding to each Accession Number (So I think I already did the hardest part) but now I need to combine this information and change de headers of my fasta for the GI of each sequence.

I've tried with this script:

fasta= open('seq.fa')
newnames= open('newnames.txt')
newfasta= open('seqnew.fa', 'w')

for line in fasta:
    if line.startswith('>'):
        newname= newnames.readline()
        newfasta.write(newname)
    else:
        newfasta.write(line)

fasta.close()
newnames.close()
newfasta.close()

But this is changing the headers for the corresponding raw of the txt. How can I change this Accession number for the GI that I already have in a tabulated txt file?

My txt file is:

Accession Number         GI
AB079690        22212526
EF394164        126842524
EU113233        157361205

Thanks

header fasta • 3.3k views
ADD COMMENT
0
Entering edit mode

Looks like you might have an answer here: Fasta sequence replacement based on header name

ADD REPLY
0
Entering edit mode
9.0 years ago

If you're interested in using a library you can take advantage of the key_function argument in pyfaidx:

ADD COMMENT
0
Entering edit mode

Many thanks!. I will try this

ADD REPLY

Login before adding your answer.

Traffic: 1754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6