extract several fasta file with a list of ID (in order)
0
0
Entering edit mode
6.6 years ago
Chvatil ▴ 130

I have a file with several names such :

seq1 seq9 seq3 seq7 seq5 seqi seqn....

and another fasta file with all my sequences, and what I need to do is to order my sequences by the order of the list above:such as:

>seq1
aaaaa
>seq9
aaaaa
>seq3
aaaaa
>seq7
aaaaa
>seq5
aaaaa
...

I tried this:

input_file = open('concatenate_0035_0042_aa2.fa','r')
output_file = open('result.fasta','a')


liste=['seq1','seq5','seq8' etc]
print(len(liste))
compteur=1
for i in liste:
    record_dict = SeqIO.to_dict(SeqIO.parse("concatenate_0035_0042_aa2.fa", "fasta"))
    print(">",record_dict[i].id,file=output_file,sep="")
    print(record_dict[i].seq,file=output_file)
    compteur+=1
    print(compteur)

output_file.close()
input_file.close()

but it actually takes too much time.

bio python Fasta • 1.2k views
ADD COMMENT
1
Entering edit mode

There is a big number of solutions, and as Pierre said, an even bigger number of previous posts with the same question. You may use samtools faidx, as per my answer here.

ADD REPLY
0
Entering edit mode

Thanks you for you help.

ADD REPLY
0
Entering edit mode

this question has been asked a large number of times here. Please search for this question. Nevertheless, regarding your code, instead of doing your loop liste-nth times. How about scanning the fasta only one time and check if the current fasta name is in your liste ?

ADD REPLY
1
Entering edit mode

Ok, thanks you, now it work :)

input_file = open('concatenate_0035_0042_aa2.fa','r') output_file = open('result.fasta','a')

compteur=1
record_dict = SeqIO.to_dict(SeqIO.parse("concatenate_0035_0042_aa2.fa", "fasta"))
for i in liste:
    if i in record_dict:
        print(">",record_dict[i].id,file=output_file,sep="")
        print(record_dict[i].seq,file=output_file)

output_file.close()
input_file.close()
ADD REPLY

Login before adding your answer.

Traffic: 1612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6