I'm working with biopython, python, and gtk to create a program to load files of bioinformatic interest.
These files have multiple sequence in them
http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.gbk
http://biopython.org/DIST/docs/tutorial/examples/ls_orchid.fasta
but this ones only have one (long) sequence.
http://biopython.org/SRC/biopython/Tests/GenBank/NC_005816.gb
http://biopython.org/SRC/biopython/Tests/GenBank/NC_005816.fna
Is there any way to know this before processing the file? How to differentiate the ones with one sequence from others with multiple sequences? I want to know when to use exactly Bio.SeqIO.read()
or Bio.SeqIO.parse()
Thanks for your time, I tried to search for answers, but I didn't find something similar to this.
is there any way to know this before processing the file?
You'd have to process it somehow to determine whether the file contains one or multiple sequences. Given this, consider using
Bio.SeqIO.parse()
, since it handles both cases.