I have a gff file, I want to write a script that outputs axons only to a new file.
I wonder what is the most efficient way to do so.
I need to iterate over the input file and split the lines into blocks starting with “gene” [one block includes all lines after 'gene' and ends one line before the next 'gene'].
For each block: I need to check if it contains lines with “mRNA”. if not continue to the next block.
For every block that starts with “mRNA”, I need to write the lines that contain “exon” to the output file.
Is there a good package in python that can help me? How do I create the blocks?
This is the gff file:
NC_048323.1 Gnomon gene 25044 78977 . + . ID=gene -LOC117420859;Dbxref=GeneID:117420859;Name=LOC117420859;gbkey=Gene;gene=LOC1174 20859;gene_biotype=protein_coding
NC_048323.1 Gnomon mRNA 25044 78977 . + . ID=rna- XM_034926345.1;Parent=gene-LOC117420859;Dbxref=GeneID:117420859,Genbank:XM_0349 26345.1;Name=XM_034926345.1;gbkey=mRNA;gene=LOC117420859;model_evidence=Support ing evidence includes similarity to: 2 Proteins%2C and 93%25 coverage of the an notated genomic feature by RNAseq alignments;product=coiled-coil domain-contain ing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 25044 25136 . + . ID=exon -XM_034926345.1-1;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM_ 034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 25929 26031 . + . ID=exon -XM_034926345.1-2;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM_ 034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 34873 34976 . + . ID=exon -XM_034926345.1-3;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM_ 034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 37369 37437 . + . ID=exon -XM_034926345.1-4;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM_ 034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 37932 38132 . + . ID=exon -XM_034926345.1-5;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM_ 034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 56841 57008 . + . ID=exon -XM_034926345.1-6;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM_ 034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 62592 62780 . + . ID=exon -XM_034926345.1-7;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM_ 034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 64485 64992 . + . ID=exon -XM_034926345.1-8;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM_ 034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 65613 65729 . + . ID=exon -XM_034926345.1-9;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM_ 034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 72109 72335 . + . ID=exon -XM_034926345.1-10;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM _034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 75873 76055 . + . ID=exon -XM_034926345.1-11;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM _034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 76336 76521 . + . ID=exon -XM_034926345.1-12;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM _034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon exon 78960 78977 . + . ID=exon -XM_034926345.1-13;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XM _034926345.1;gbkey=mRNA;gene=LOC117420859;product=coiled-coil domain-containing protein 171-like;transcript_id=XM_034926345.1
NC_048323.1 Gnomon CDS 25044 25136 . + 0 ID=cds- XP_034782236.1;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XP_034 782236.1;Name=XP_034782236.1;gbkey=CDS;gene=LOC117420859;product=coiled-coil do main-containing protein 171-like;protein_id=XP_034782236.1
NC_048323.1 Gnomon CDS 25929 26031 . + 0 ID=cds- XP_034782236.1;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XP_034 782236.1;Name=XP_034782236.1;gbkey=CDS;gene=LOC117420859;product=coiled-coil do main-containing protein 171-like;protein_id=XP_034782236.1
NC_048323.1 Gnomon CDS 34873 34976 . + 2 ID=cds- XP_034782236.1;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XP_034 782236.1;Name=XP_034782236.1;gbkey=CDS;gene=LOC117420859;product=coiled-coil do main-containing protein 171-like;protein_id=XP_034782236.1
NC_048323.1 Gnomon CDS 37369 37437 . + 0 ID=cds- XP_034782236.1;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XP_034 782236.1;Name=XP_034782236.1;gbkey=CDS;gene=LOC117420859;product=coiled-coil do main-containing protein 171-like;protein_id=XP_034782236.1
NC_048323.1 Gnomon CDS 37932 38132 . + 0 ID=cds- XP_034782236.1;Parent=rna-XM_034926345.1;Dbxref=GeneID:117420859,Genbank:XP_034 782236.1;Name=XP_034782236.1;gbkey=CDS;gene=LOC117420859;product=coiled-coil do main-containing protein 171-like;protein_id=XP_034782236.1
NC_048323.1 Gnomon mRNA 111664 172479 . - . ID=rna- XM_034035428.2;Parent=gene-LOC117421266;Dbxref=GeneID:117421266,Genbank:XM_0340 35428.2;Name=XM_034035428.2;gbkey=mRNA;gene=LOC117421266;model_evidence=Support ing evidence includes similarity to: 13 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 10 samples with sup port for all annotated introns;product=eukaryotic translation initiation factor 2-alpha kinase 3-like%2C transcript variant X1;transcript_id=XM_034035428.2
NC_048323.1 Gnomon exon 172022 172479 . - . ID=exon -XM_034035428.2-1;Parent=rna-XM_034035428.2;Dbxref=GeneID:117421266,Genbank:XM_ 034035428.2;gbkey=mRNA;gene=LOC117421266;product=eukaryotic translation initiat ion factor 2-alpha kinase 3-like%2C transcript variant X1;transcript_id=XM_0340 35428.2
NC_048323.1 Gnomon exon 157760 157889 . - . ID=exon -XM_034035428.2-2;Parent=rna-XM_034035428.2;Dbxref=GeneID:117421266,Genbank:XM_ 034035428.2;gbkey=mRNA;gene=LOC117421266;product=eukaryotic translation initiat ion factor 2-alpha kinase 3-like%2C transcript variant X1;transcript_id=XM_0340 35428.2