I have a list of gene sequences with their accession ids in fasta format in one file ("ls_orchid.fasta"). I also have a list of needed accession ids ("ids.txt"). Now I want to collect the gene sequences only for the needed accession numbers i,e for those present in ids.txt.
from Bio import SeqIO
for seq_record in SeqIO.parse("ls_orchid.fasta","fasta"):
f=open('ids.txt','r')
lines=f.readlines()
for i in lines:
if (str(i) in seq_record.id) == True:
print(">"+seq_record.id+"\n"+seq_record.seq)
The above code gives only the fasta file of the last accession number. Please help me resolve this issue.
I understood the code. I tried to run it by modifying with the file names I have got. But I am not getting any output. It is just blank. Why is it so? Can you please help me?
AttributeError: 'FastaIterator' object has no attribute 'id'
Could you please at first try the following. Copy my code
exactly
without any adaptation to your local files, save the little test.fasta file as given above, and then run the python code. I should give:I suspect a minor difference in your code. If you still cannot run your code, please post the
exact
code you used, and your Python and Biopython version. Mine is Python 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:37:41)Okay. I copy pasted the exact code and I got the exact output as mentioned by you. But it works only if the ids.txt has only one id say abc. If i give abc and abd, I am not getting the fasta sequence for for the ids instead I am getting the fasta sequence for the last id in the ids.txt i.e., in this case only for abd. But I want the fasta sequence of abc and abd. How to get that?
Ah, I see, sorry. I did not test the id parsing code because it was in a comment. But readlines retains the linebreak "\n" so that's why it doesn't give any output.
Here is the code with id parsing, try this:
It works perfectly. Thank you so much.