Hi,
Sometimes I need to download, let's say around 3k sequences, using for this purpose Gi list. I know how to download them, the problem lies elsewhere. This gi list is generated by another script and sometimes among gi's related to protein sequences it contains also gi's of DNA (or perhaps RNA) sequences. Since I am interested only in protein sequences - is there a way to download only protein sequences?
I am using biopython but I haven't seen in SeqIO any variable which could describe type of sequence. Another option could be filtering fasta after download. Ofc I could iterate through whole sequence and check if it contain letters different than ATGCU, but this is not so elegant, and I think it will work slow. Do you know any better solutions?
Thanks in advance!
Cheers, Mateusz
Seems to be pretty good solution. I assume that I could download it (without saving on hdd), check type, and if protein transform to fasta in the fly and save on hdd.
Thanks. :)