Question

GFF file multiple features for 1 gene region, how to collapse into 1?

0

Entering edit mode

6.1 years ago

YOUSEUFS ▴ 30

Hello, Noob here

My GFF3 file (Converted into BED) contains multiple lines that describe the same gene region but with varying feature ID's (Below)

NC_002978.6     3027    3115    gene2   .       +       RefSeq  gene    .       ID=gene2;Dbxref=GeneID:29555340;Name=WD_RS00025;gbkey=Gene;gene_biotype=tRNA;locus_tag=WD_RS00025;old_locus_tag=tRNA-Leu-1

NC_002978.6     3027    3115    id1     .       +       tRNAscan-SE     exon    .       ID=id1;Parent=rna0;Dbxref=GeneID:29555340;anticodon=(pos:3062..3064);gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:1.23;pr
oduct=tRNA-Leu

NC_002978.6     3027    3115    rna0    .       +       tRNAscan-SE     tRNA    .       ID=rna0;Parent=gene2;Dbxref=GeneID:29555340;anticodon=(pos:3062..3064);gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:1.23;
product=tRNA-Leu

How would I collapse these to give me a single gene region associated with a single feature?

Context: This would then be fed into "bedtools closest" so I can match transcriptional start sites to their closest annotated gene

P.s apologies in advance for any incorrect formatting

RNA-Seq GFF BEDTOOLS • 2.3k views

ADD COMMENT • link updated 6.1 years ago by Carambakaracho ★ 3.3k • written 6.1 years ago by YOUSEUFS ▴ 30

1

Entering edit mode

Hi Noob,

Your file somewhat resembles a BED, but it's quite confusing. Anyway, start with this to filter for only gene features:

awk '$8 == "gene"' your_file.bed > your_file.genes.bed

Now you should only have genes, which may still overlap, but will be unique genes.

ADD REPLY • link 6.1 years ago by goodez ▴ 640

0

Entering edit mode

Thank you very much!

ADD REPLY • link 6.1 years ago by YOUSEUFS ▴ 30

score 0 · Answer 1 · 2018-10-18

0

Entering edit mode

6.1 years ago

Carambakaracho ★ 3.3k

Filter for the gene features, either your gff column 3 or your bed file. However, a tRNA feature might rather be an exception than the rule. You can do this even with excel.

ADD COMMENT • link 6.1 years ago by Carambakaracho ★ 3.3k