Extract features from GFF file
1
0
Entering edit mode
7.9 years ago
Ander ▴ 50

Hi pals,

I have a genome in GTF format like this:

CP014038.1  GeneMarkS+  CDS 3717912 3718988 .   -   0   "ID=cds0;Parent=gene0;Dbxref=NCB...."

CP014038.1  Genbank gene    631 2190    .   -   .   "ID=gene1;Name=AL538_00010;gbkey=....."

Is there a way to extract the features i want (locus_tag, Name...) from the last column and make it look like this?

CP014038.1  GeneMarkS+  CDS 3717912 3718988 .   -   0   "locus_tag=AL34598_3409; Name=N/A;....."

Thanks for your help Ander

genome sequence gene • 4.0k views
ADD COMMENT
2
Entering edit mode
7.8 years ago

You could use the following GTF-processing skeleton, which extracts the attributes column to a Python dictionary.

#!/usr/bin/env python

import sys
import os

for line in sys.stdin:
    convertedLine = ""
    chomped_line = line.rstrip(os.linesep)
    if chomped_line.startswith('##'):
        pass
    elif chomped_line.startswith('track'):
        # skip non-standard use of track keyword by Ensembl 
        pass
    else:
        elems = chomped_line.split('\t')
        cols = dict()
        try:
            cols['seqname'] = elems[0].lstrip(' ') # strip leading whitespace
            cols['source'] = elems[1]
            cols['feature'] = elems[2]
            cols['start'] = int(elems[3])
            cols['end'] = int(elems[4])
            cols['score'] = elems[5]
            cols['strand'] = elems[6]
            cols['frame'] = elems[7]
            cols['attributes'] = elems[8].rstrip(' ') # strip trailing whitespace
        except IndexError as ie:
            sys.stderr.write("[%s] - Error: Input appears to be missing GTF-specific fields (check that your input data is GTF-formatted)\n" % (sys.argv[0]))
            sys.exit(os.EX_DATAERR)

        try:
            cols['comments'] = elems[9]
        except IndexError as ie:
            cols['comments'] = None

        attributes = dict(item.strip().split(' ') for item in cols['attributes'].split(';') if item)

        # do stuff with attributes

You could filter out key-value pairs, process certain keys, or rewrite key-value pairs in some desired order, etc.

ADD COMMENT
0
Entering edit mode

I forgot to close the thread when I managed to get what I was looking for. Thanks anyway, I'll try your aproach next time I need to do this again!!

ADD REPLY

Login before adding your answer.

Traffic: 1862 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6