I wanted to check the no of contigs present in either a FASTA or GBK file, I am aware of algorithms such as CheckM that will allow for this process, however is there a direct code to check no of contigs in a sequence directly with python or biopython?
If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
from Bio import SeqIO
recs = list(SeqIO.parse('genbank.gbk', 'genbank'))
len(recs)
This could be more memory efficient with an iterator, but this is a quick and easy way.
This is likely a more robust solution too, since *nix solutions require that you know your files very well, such that they don't have any nasty surprises in them.
you can try with basic utilities in *nix.
like with grep commands etc?
An easy
grep
solution to count entries in a genbank, is the number ofLOCUS
lines:For a multifasta, you can use
^>
instead ofLOCUS
as you have noted.