Hi everyone!
I need help with something. I am very new to bioinformatics.
I have a fasta file with 32K reference sequences for an X gene. The headers are the Accession numbers, but I need to change them for the GI.
I already have a txt file with the GI corresponding to each Accession Number (So I think I already did the hardest part) but now I need to combine this information and change de headers of my fasta for the GI of each sequence.
I've tried with this script:
fasta= open('seq.fa')
newnames= open('newnames.txt')
newfasta= open('seqnew.fa', 'w')
for line in fasta:
if line.startswith('>'):
newname= newnames.readline()
newfasta.write(newname)
else:
newfasta.write(line)
fasta.close()
newnames.close()
newfasta.close()
But this is changing the headers for the corresponding raw of the txt. How can I change this Accession number for the GI that I already have in a tabulated txt file?
My txt file is:
Accession Number GI
AB079690 22212526
EF394164 126842524
EU113233 157361205
Thanks
Looks like you might have an answer here: Fasta sequence replacement based on header name