Entering edit mode
5.4 years ago
colindaven
7.0k
Hello everybody,
I need to compare old 3' tag RNA-seq data where only the last couple of exons are sequenced to modern full gene RNA-seq data.
To make things slightly more comparable I want to only count reads mapping to the last 4 or so exons per transcript.
Does anyone know a tool which can filter GTFs or GFF3 to only include the last exons of a transcript, or will I have to write something myself ? I know quite a few tools, but none have this (admittedly weird) functionality.
Thanks
Edit - we wrote a Python script here to cover this, thanks Fabian. None of the simple approaches suggested here worked.
Either filter GTF or count at
exon
level withfeatureCounts
and then filter the counts to keep exons you want. You would need some custom filtering in any case.I think better to write a small code either in R or using awk in linux. I know there are various ways to do this.
Try this command
It will extract the whole line with respect to three exons.
Thanks, looks like this command is off though. It's extracting three different biotypes from the Gencode v28 GTF and also throwing a no such file error.
you can do one more thing sort the data with respect to gene_id and
You can also modify the command according to your file columns.