Filtering a GFF to retain only exons with CDS annotation
0
1
Entering edit mode
2.4 years ago
elcortegano ▴ 200

Hi all, I'm trying to filter a large GFF file with many gene/exon entries to make it retain only those gene/exons entries that contain CDS in their hierarchy. eg. excluding genes or exons related to ncRNA that do not have CDS.

I am not aware of any tool that allows to do this filtering, but I assume there must be one. Does anyone know how to do this? Thanks!

EDIT

To provide an example, I'd like to keep sequences like the ones in contig ptg000013l below (where an exon has a CDS annotation contained within in), and exclude other exon annotations without CDS sequences within.

ptg000013l      ensembl exon    49126502        49128513        .       -       .       Parent=transcript:ENSMUST00000238969;Name=ENSMUSE00000644098;constitutive=0;ensembl_end_phase=-1;ensembl_phase=0;exon_id=ENSMUSE00000644098;rank=8;v
ptg000013l      havana  CDS     49127986        49128513        .       -       0       ID=CDS:ENSMUSP00000158947;Parent=transcript:ENSMUST00000238953;protein_id=ENSMUSP00000158947
ptg000048l      havana  exon    8219576 8219759 .       -       .       Parent=transcript:ENSMUST00000211519;Name=ENSMUSE00001383012;constitutive=0;ensembl_end_phase=-1;ensembl_phase=-1;exon_id=ENSMUSE00001383012;rank=1;v
gff annotation • 1.7k views
ADD COMMENT
0
Entering edit mode

it would help if you could post few lines. GFF file is a text file and you can filter the text file based on pattern. Based on three lines and assuming that gtf fields are tab separated:

$ awk -F "\t" '$3=="CDS"' test.gff
ADD REPLY
0
Entering edit mode

This will only return the CDS entry. The trick is to get as well exons (and gene entries if present), but only those with CDS annotation within them. In the example case, I'd expect to get the CDS annotation as well as exon in ptg000013l, since it does contain a CDS, but not the other exon in ptg000048l.

ADD REPLY

Login before adding your answer.

Traffic: 2461 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6