Entering edit mode
6.1 years ago
YOUSEUFS
▴
30
Hello, Noob here
My GFF3 file (Converted into BED) contains multiple lines that describe the same gene region but with varying feature ID's (Below)
NC_002978.6 3027 3115 gene2 . + RefSeq gene . ID=gene2;Dbxref=GeneID:29555340;Name=WD_RS00025;gbkey=Gene;gene_biotype=tRNA;locus_tag=WD_RS00025;old_locus_tag=tRNA-Leu-1
NC_002978.6 3027 3115 id1 . + tRNAscan-SE exon . ID=id1;Parent=rna0;Dbxref=GeneID:29555340;anticodon=(pos:3062..3064);gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:1.23;pr
oduct=tRNA-Leu
NC_002978.6 3027 3115 rna0 . + tRNAscan-SE tRNA . ID=rna0;Parent=gene2;Dbxref=GeneID:29555340;anticodon=(pos:3062..3064);gbkey=tRNA;inference=COORDINATES: profile:tRNAscan-SE:1.23;
product=tRNA-Leu
How would I collapse these to give me a single gene region associated with a single feature?
Context: This would then be fed into "bedtools closest" so I can match transcriptional start sites to their closest annotated gene
P.s apologies in advance for any incorrect formatting
Hi Noob,
Your file somewhat resembles a BED, but it's quite confusing. Anyway, start with this to filter for only gene features:
Now you should only have genes, which may still overlap, but will be unique genes.
Thank you very much!