Hi,
I have been wondering at the correct approach in Python, maybe using Biopython, of parsing a fasta file without having to place it in memory (eg: NOT having to read it to a list, dictionary or fasta class) before using it.
The desired result would behave like a generator, as in the pseudo-code example below:
fasta_sequences = fasta_generator(input_file) # The function I miss
with open(output_file) as out_file:
for fasta in fasta_sequences:
name, sequence = fasta
new_sequence = some_function(sequence)
write_fasta(out_file) # Function defined elsewhere
Important aspects are:
- Read sequences one at a time
- Does not put all the sequences into memory
- The approach is safe and well tested
Thanks for your suggestions!
I use Biopython all the time, but parsing fasta files is all I ever use it for. :) Two other functions I use for fasta parsing is: SeqIO.to_dict() which builds all sequences into a dictionary and save it in memory SeqIO.index() which builds a dictionary without putting the sequences in memory
I was thinking of looking into Biopython a little deeper, since it offers much more than fasta parsing, but did not get a chance. :(
Very useful answer from 7 years ago! FYI, in current version of biopython(1.69),
fasta.seq.tostring()
is obsolete, usestr(fasta.seq)
instead.Very nice example. I knew Biopython offered something of the sort, but had never tried it. @Zhaorong do you have a lot of experience with Biopython? What have you used it for? Cheers
fasta.seq.tostring() should have been str(fasta.seq). This example is wrong.
The example is now wrong, but almost 9 years ago when this was written it was perfectly valid. The Bio.Seq.Seq.tostring() method was removed in the latest Biopython release, and was deprecated a bit before: https://github.com/biopython/biopython/blob/654309121f2cc0c01dfff73cd3dec3a435d76fc2/DEPRECATED.rst#bioseqseqtostring-and-bioseqmutableseqtostring
It is indeed wrong today. I edited the answer since it has been possible to use
str(sequence)
for a long time now.Thanks. I constantly forget that I'm a moderator.