Question

Parsing GTF file - Help!

0

Entering edit mode

9.5 years ago

espop23 ▴ 60

I have data from gencode which looks like this:

chr1    ENSEMBL    gene    17369    17436    .    -    .    gene_id "ENSG00000278267.1"; gene_type "miRNA"; gene_status "KNOWN"; gene_name "MIR6859-1"; level 3;
chr1    ENSEMBL    gene    30366    30503    .    +    .    gene_id "ENSG00000274890.1"; gene_type "miRNA"; gene_status "KNOWN"; gene_name "MIR1302-2"; level 3;
chr1    ENSEMBL    gene    157784    157887    .    -    .    gene_id "ENSG00000222623.1"; gene_type "snRNA"; gene_status "KNOWN"; gene_name "RNU6-1100P"; level 3;

I have tried using gffutils, but I get an error with this code:

import gffutils

db = gffutils.create_db("sRNA.gene.gtf", dbfn='sRNA.gene.gtf.db')

print(list(db.featuretypes()))
# ['CDS', 'exon', 'gene', 'start_codon', 'stop_codon', 'transcript']

# Here's how to write genes out to file
with open('sRNA.gene.gtf', 'w') as fout:
    for gene in db.features_of_type('gene'):
    fout.write(str(gene) + '\n')

Where it says

ImportError: cannot import name 'feature'.

Can someone please offer suggestions on the best way to parse such GTF files?

python gtf r gffutils • 5.1k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 9.5 years ago by espop23 ▴ 60

1

Entering edit mode

If I use your example GTF file and your example code, it works -- with the exception that the list of featuretypes is ['gene'] since only gene features are in your example GTF.

Can you provide a minimal example (complete code and input) that reproduces the error?

More generally, what is your end goal? It may not be necessary to create a database. For example, you can use gffutils just for parsing a GTF file (with the gffutils.FeatureIterator class).

Last, see some hints at A: GFFutils very slow at creating database file. Any Idea why..? for using GENCODE GTF files which now already include features for genes and transcripts.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.5 years ago by Ryan Dale 5.0k

0

Entering edit mode

Hello espop23!

It appears that your post has been cross-posted to another site: https://www.reddit.com/r/bioinformatics/comments/3rvn3g/help_parsing_gtf_file/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 9.5 years ago by Pierre Lindenbaum 166k