I am trying to filter out sequences using SeqIO but I am getting this error.
Traceback (most recent call last):
File "paralog_warning_filter.py", line 61, in <module>
.
.
.
SeqIO.write(desired_proteins, "filtered.fasta","fasta")
AttributeError: 'str' object has no attribute 'id'
I checked other similar questions but still couldn't understand what is wrong with my script.
Here is the relevant part of the script I am trying:
fh=open('lineageV_paralog_warning_genes.fasta')
for s_record in SeqIO.parse(fh,'fasta'):
name = s_record.id
seq = s_record.seq
for i in paralogs_in_all:
if name.endswith(i):
desired_proteins=seq
output_file=SeqIO.write(desired_proteins, "filtered.fasta","fasta")
output_file
fh.close()
I have a separate paralagos_in_all
list and that is the ID source. When print name
it returns a proper string id names which are in this format >coronopifolia_tair_real-AT2G35040.1@10
.
Can you help me understand my problem? Thanks in advance.
Bio.SeqIO.write()
is expectingSeqRecord
. Instead ofdesired_proteins
, you can doSeqIO.write(s_record, ...)
.Note:
filtered.fasta
will only have the lasts_record
inlineageV_paralog_warning_genes.fasta
that is found inparalogs_in_all
becausefiltered.fasta
will be overwritten during the loop.Thank you I understand it now. You were right about overwriting as well. After fixing that script worked smoothly!
I guess you need to write
s_record
instead of parsed sequence i.e.desired_proteins
. Something like this?desired_proteins=seq
you are assigning only sequence here.