Entering edit mode
5.3 years ago
Ric
▴
440
I have the following gff3 file. Some of the mRNA contains an attribute Note and some no.
...
NbV1Ch07 AUGUSTUS gene 31690934 31699151 . - . ID=g88996;Name=POLX_2333
NbV1Ch07 AUGUSTUS mRNA 31690934 31699151 . - . ID=g88996.t1;Parent=g88996;Note=Retrovirus-related Pol polyprotein from transposon TNT 1-94;Dbxref=CDD:cd09272,InterPro:IPR013103,InterPro:IPR029472,MOBIDBLITE:mobidb-lite,PFAM:PF07727,PFAM:PF14244,SUPERFAMILY:SSF56672
['Retrovirus-related Pol polyprotein from transposon TNT 1-94']
...
I wrote the below script which failed to filter genes into two files. If Note is in mRNA attribute then write gene, mRNA, exon, CDS into keep.gff3
file.
If Note is not in mRNA attribute then write gene, mRNA, exon, CDS into reject.gff3
file.
#!/usr/bin/python3
import click
import gffutils
@click.command()
@click.option('--gff3', help="Provide GFF3 file", required=True)
@click.option('--keep', help="Keep GFF3 file", required=True)
@click.option('--reject', help="Reject GFF3 file", required=True)
def run(gff3, keep, reject):
#db = gffutils.create_db(gff3, dbfn='data/test.db', force=True, keep_order=True, merge_strategy='merge',
# sort_attribute_values=True)
db = gffutils.FeatureDB('data/test.db', keep_order=True)
with open(keep, 'w') as out_keep:
with open(reject, 'w') as out_reject:
for gene in db.features_of_type('gene', order_by='start'):
print(gene)
for i in db.children(gene, featuretype='mRNA', order_by='start'):
print(i)
try:
if i.attributes['Note']:
print(i.attributes['Note'])
out_keep.write(str(gene) + '\n')
else:
out_reject.write(str(gene) + '\n')
except KeyError:
continue
if __name__ == '__main__':
run()
Unfortunately, the above script only prints in keep.gff3
file the genes and nothing else.
What did I miss?
Thank you in advance,
Hello Ric ,
what your method is doing, is iterate over all
gene
's, finding all children of that genes that aremRNA
and write all of these mRNA who have aNote
attribute tokeep.gff3
. IfNote
is not a valid key, python will raise an KeyError exception, so theelse
block is never executed and nothing is written toreject.gff3
. The correct way to check if a key exists would beif "Note" in i.attributes
.fin swimmer
Thank you, I changed it but both files contain only the features
gene
but missingmRNA, exon, CDS
. How is it possible to add those ones, too?Of course, because you are writing only
gene
to the fileCould you please explain a little bit more about your goal? Also a larger example of your gff file would be helpful with lines you've expected in the one or in the other output file.
If you cross post your question please leave a note, so that other can see what is already answered!