Entering edit mode
2.6 years ago
Kevin Blighe
88k
Dear community,
To 'flatten' a GTF is required for some programs. However, how does one flatten a GTF? There is not too much material online about it.
I had found this python script --written for Python 2.x, I believe-- for DEXSeq: https://github.com/olgabot/rna-seq-diff-exprn/blob/master/scripts/external/dexseq_prepare_annotation.py
However, is there a more rudimentary way?
Mit freundlichen Grüßen,
Kevin
I had a customer asking me for something similar: https://github.com/NBISweden/AGAT/issues/188
I thought this type of task was really rare/particular, apparently not as much...
The thing is I don't get what has to be flatten... what should happen when exons from differente genes would be merged because overlap (or non coding gene vs coding gene)? What should be done about attributes like ID and gene name in such case? We risk to loose all useful information hold by the GFF/GTF e.g. about from which gene exon are part of. If finally all underlying information is really needed I guess using bedtools intersect might be enough.
Good points, Juke34
I have never seen anyone merging exons from different genes. Rsubread and DEXSeq both document how they flatten exons and neither merge exons from different genes.
Ok so
Rsubread
andDEXSeq
both merge exons from isoforms only. This sounds much more easy. The person that asked me for that task had problem because the tool he wanted to use ( FeatureSequence ) was not able to work due to exons from different genes that was overlaping.