Hello, I woud like to ask you... I have file of like 1000 sequences from BLAST results with clasic header like this...
">PLN78092.1 putative endo-1,3(4)-beta-glucanase [Aspergillus taichungensis]"
I would like to change the header of every sequence to contain only name of organisms... using biopython.
">Aspergillus taichungensis"
When I download results in fasta format and I parse it using biopython I can find organism name only in description, but in desctiption there is a whole header
from Bio import SeqIO
records = list(SeqIO.parse("sequence.fasta", "fasta"))
for x in range (len(records)):
print(records[x].description)
PLN78092.1 putative endo-1,3(4)-beta-glucanase [Aspergillus taichungensis] ...
Ofcourse I could just extract text in brackets "[ ]", but is there any way how to get only the name for example by parsing .xml format of results? Something like this:
from Bio.Blast import NCBIXML
result_handle = open("sequences.xml")
blast_records = NCBIXML.parse(result_handle)
blast_records = list(blast_records)
print(blast_records[0].organism) #this is not working
Please use the formatting bar (especially the
code
option) to present your post better.Thank you!