Entering edit mode
6.7 years ago
dmitri.ivanovsky
•
0
Hi all! Please help. I parsed sequences from GenBank, renamed it and saved as a fasta file.
>KP821216.1_Bluetongue v_Cameroon_Jan-1982
ATGGCTGCTCAGAATGAGCAACGTCCGGAGCGAATAAAAACGACACCGTATTTAGAGGGA
GATGTGCTTTCGAGTGATTCAGGACCGCTGCTTTCCGTGTTCGCGCTGCAAGAAATAATG
The last 4 characters is a year when the viruse was isolated. Now I need to select the only records that are in the some range (for example 1958-1990):
from Bio import SeqIO
output_file = open("range_date_select.txt", "w")
date_from = 1958
date_to = 1990
count = 0
for i, record in enumerate(SeqIO.parse("Bluetong_batch_cds.txt", "fasta")):
a = record.description[-4:]
if date_from <= int(a) <= date_to:
SeqIO.write(record, output_file, "fasta")
count = count + 1
print(count)
output_file.close()
Further the task becomes more complicated: I need not more 4 records for the year. If its number is more, 4 records should be chosen randomly.
Can anybody help me how to do this? Thanks in advance.
Thanks a lot! It works very well!
But could you please explain why this works:
If there are more than 4 records in the list "d[year]" , it shouldn't be recorded because the condition "if i < 4" is not met? But its are written down. I'm a newbie in python so I know this is probably a very basic question.
Yes, you are right. In the first
for
loop I iterate over each year. Then, I shuffle the listd[year]
to make sure you have a random order of sequences for that year. At this point,d[year]
contains all sequences for a given year (there may be more than 4 sequences). In the secondfor
loop I iterate over each sequence record in thed[year]
list and counting them - as iteration goes - from 0 to numer of sequences in d[year] list (so the variablei
is just a counter). For first sequencei
is 0, for secondi
is 1, and so on. So thisif i < 4
statement means that only first four sequences ind[year]
will be saved in output file. Nothing will be done with fifth (i
= 4), sixth (i
= 5), nth sequence in the list. If you are satisfied with my answer, please mark it as accepted.