Entering edit mode
6.6 years ago
Chvatil
▴
130
I have a file with several names such :
seq1 seq9 seq3 seq7 seq5 seqi seqn....
and another fasta file with all my sequences, and what I need to do is to order my sequences by the order of the list above:such as:
>seq1
aaaaa
>seq9
aaaaa
>seq3
aaaaa
>seq7
aaaaa
>seq5
aaaaa
...
I tried this:
input_file = open('concatenate_0035_0042_aa2.fa','r')
output_file = open('result.fasta','a')
liste=['seq1','seq5','seq8' etc]
print(len(liste))
compteur=1
for i in liste:
record_dict = SeqIO.to_dict(SeqIO.parse("concatenate_0035_0042_aa2.fa", "fasta"))
print(">",record_dict[i].id,file=output_file,sep="")
print(record_dict[i].seq,file=output_file)
compteur+=1
print(compteur)
output_file.close()
input_file.close()
but it actually takes too much time.
There is a big number of solutions, and as Pierre said, an even bigger number of previous posts with the same question. You may use
samtools faidx
, as per my answer here.Thanks you for you help.
this question has been asked a large number of times here. Please search for this question. Nevertheless, regarding your code, instead of doing your loop liste-nth times. How about scanning the fasta only one time and check if the current fasta name is in your
liste
?Ok, thanks you, now it work :)
input_file = open('concatenate_0035_0042_aa2.fa','r') output_file = open('result.fasta','a')