Entering edit mode
7.6 years ago
Chvatil
▴
140
I have a file with several names such :
seq1 seq9 seq3 seq7 seq5 seqi seqn....
and another fasta file with all my sequences, and what I need to do is to order my sequences by the order of the list above:such as:
>seq1
aaaaa
>seq9
aaaaa
>seq3
aaaaa
>seq7
aaaaa
>seq5
aaaaa
...
I tried this:
input_file = open('concatenate_0035_0042_aa2.fa','r')
output_file = open('result.fasta','a')
liste=['seq1','seq5','seq8' etc]
print(len(liste))
compteur=1
for i in liste:
record_dict = SeqIO.to_dict(SeqIO.parse("concatenate_0035_0042_aa2.fa", "fasta"))
print(">",record_dict[i].id,file=output_file,sep="")
print(record_dict[i].seq,file=output_file)
compteur+=1
print(compteur)
output_file.close()
input_file.close()
but it actually takes too much time.
There is a big number of solutions, and as Pierre said, an even bigger number of previous posts with the same question. You may use
samtools faidx, as per my answer here.Thanks you for you help.
this question has been asked a large number of times here. Please search for this question. Nevertheless, regarding your code, instead of doing your loop liste-nth times. How about scanning the fasta only one time and check if the current fasta name is in your
liste?Ok, thanks you, now it work :)
input_file = open('concatenate_0035_0042_aa2.fa','r') output_file = open('result.fasta','a')