Question

Get some errors in a biopython code for split a multifasta genomic file

0

Entering edit mode

5.1 years ago

schlogl ▴ 160

I download from NCBI a multifasta file with 40 viral genomes.
I read the file and tried to separate the files in single genomes like this:

for rec in SeqIO.parse(file, 'fasta'):
    ids = rec.id.split('|')[3]
    seqs = rec.seq
   #check for ids and seqs with len() and print(seqs[:500])
    outputfile = open('genome_', + ids + '.fasta')
    outputfile.write('>' + ids + '\n')
    outputfile.write(seqs)
outputfile.close()

As I said in the comments I printed it out the lengths of the ids and sequences and seems working just ok. But when I checked the files in my dir, some of them (some big genomes) got 0 sequence lengths. However, others are alright.

Some of you guys have any idea why this is happening?

Thanks for your time .

PS- I stated in bold that I got many good files and then I assume that the code is right, however, I just asking why some files doesn't work! The code is easy, I don't receive any error message, but some files got empty.

I just asking if someone here got something like that and what was done to fix it.

genome biopython • 1.2k views

ADD COMMENT • link 5.1 years ago by schlogl ▴ 160

2

Entering edit mode

Try putting the outputfile.close() in the loop, not only once at the end of it.

ADD REPLY • link 5.1 years ago by WouterDeCoster 47k

1

Entering edit mode

You're using SeqIO wrong. It should be SeqIO.parse.

I think your output file is also wrong since ids will be a list, not a string.

ADD REPLY • link 5.1 years ago by Joe 21k

0

Entering edit mode

Hey Joe no I just forgot to put '.parse' in the code. I will edit may post.

But If it is a list I would got a error message or something, but I got most of the file alright, but some of them got no seq et all.

Thanks

ADD REPLY • link 5.1 years ago by schlogl ▴ 160

1

Entering edit mode

Are you sure rec.ids Is valid? Normally it is just id.

It's not obvious to me why you'd be getting 0 length sequences from that code, so there's something else going on I think.

ADD REPLY • link 5.1 years ago by Joe 21k

0

Entering edit mode

It is a typo here. Because many sequences worked out. ;)

ADD REPLY • link 5.1 years ago by psschlogl ▴ 50

2

Entering edit mode

It's advisable to copy and paste your code exactly, rather than re-typing it, else we aren't truly seeing what you are seeing. In python, where syntax and white space is strongly enforced, this is even more the case.

ADD REPLY • link 5.1 years ago by Joe 21k

score 1 · Accepted Answer · 2019-10-27

1

Entering edit mode

5.1 years ago

schlogl ▴ 160

Got everything done with:

file = 'myfile.fasta'

with open(file, "r") as hd:
    for record in SeqIO.parse(hd, "fasta"):
        with openstrrecord.id[:-2]) + ".fasta", "w") as output_handle:
            SeqIO.write(record, output_handle, "fasta")

Thank you guys for your time and kindness!

Paulo 8)

ADD COMMENT • link 5.1 years ago by schlogl ▴ 160