Hi, I have gtf file I need to change the coordinate according to + and - strand to eliminate UTR region and consider CDs start and end coordinate.
My primary gtf file-
Chr_3a transdecoder gene 26355 34213 . - . ID=MSTRG.7.5
Chr_3a transdecoder cds 33198 33363 . - 0 ID=MSTRG.7.5
Chr_3a transdecoder cds 30850 31322 . - 2 ID=MSTRG.7.5
Chr_3a transdecoder cds 29756 30785 . - 0 ID=MSTRG.7.5
Chr_3a transdecoder cds 29426 29679 . - 2 ID=MSTRG.7.5
Chr_3a transdecoder gene 13108235 13128245 . + . ID=MSTRG.1
Chr_3a transdecoder cds 13113822 13113951 . + 0 ID=MSTRG.1
Chr_3a transdecoder cds 13114050 13114146 . + 2 ID=MSTRG..1
Chr_3a transdecoder cds 13114259 13114432 . + 1 ID=MSTRG..1
Chr_3a transdecoder cds 13116046 13116286 . + 1 ID=MSTRG.1
Chr_3a transdecoder cds 13117096 13120860 . + 0 ID=MSTRG..1
Expected formate
In - strand
Chr_3a transdecoder gene 29426 33363 . - . ID=MSTRG.7.5
Chr_3a transdecoder cds 33198 33363 . - 0 ID=MSTRG.7.5
Chr_3a transdecoder cds 30850 31322 . - 2 ID=MSTRG.7.5
Chr_3a transdecoder cds 29756 30785 . - 0 ID=MSTRG.7.5
Chr_3a transdecoder cds 29426 29679 . - 2 ID=MSTRG.7.5
While in + strand
Chr_3a transdecoder gene 13113822 13120860 . + . ID=MSTRG.1
Chr_3a transdecoder cds 13113822 13113951 . + 0 ID=MSTRG.1
Chr_3a transdecoder cds 13114050 13114146 . + 2 ID=MSTRG.1
Chr_3a transdecoder cds 13114259 13114432 . + 1 ID=MSTRG.1
Chr_3a transdecoder cds 13116046 13116286 . + 1 ID=MSTRG.1
Chr_3a transdecoder cds 13117096 13120860 . + 0 ID=MSTRG.1
Kindly suggest to me how to get my desirable output I am not good at programming and changing coordinates manually is very tough for all genes.
Thank you
This problems seems to be not about removing UTR genes but finding the to ends of the CDS regions, kind of a merging CDS regions that belong to the same transcript.
Looks at posts like these: