I'm trying to parse a gtf file using this code:
from BCBio import GFF
gtf_rec = []
in_file = 'cuffcmp.combined.gtf'
out_file = 'extract.gtf'
with open(in_file) as f:
for line in f:
if 'class_code "x"' or 'class_code "u"' or 'class_code "i"' in line:
gtf_rec.append(line)
with open(out_file, "w") as out_handle:
GFF.write(gtf_rec, out_handle)
in_file.close()
out_handle.close()
When I print(gtf_rec)
, the required information is filtered out, but when I try to write then into a new file I get this AttributeError
:
File "/excise.py", line 18, in <module>
GFF.write(gtf, out_handle)
File "/GFFOutput.py", line 202, in write
return writer.write(recs, out_handle, include_fasta)
File "/GFFOutput.py", line 80, in write
self._write_rec(rec, out_handle)
File "/GFFOutput.py", line 108, in _write_rec
if len(rec.seq) > 0:
AttributeError: 'str' object has no attribute 'seq'
I'm new in bioinformatics, and I have spent to much time trying to solve this. The general explanation for this error can't help me to fix it.
Would like to know if some of you can find out the cause of the error or give me another tip to do this parsing.
There is extensive material to work with files other than gtf.
Thank you very much!
Oh, for sure, I already tried to find a specific
SeqRecord
limiter for attributes but I still couldn't.But you gave me the error solution, your answer will certainly help me to get in something, thank you.
I think,
GFF.parse(...,limit_info=)
should do the trick to restrict the output to specific attributes. See section Limiting to features of interest in the tutorial.