Entering edit mode
8.4 years ago
scchess
▴
640
I have a large GTF file, I want the following information?
- Number of unique exons
- Number of unique introns
- Locus of those unique exons
- Locus of those unique introns
What'll be the best way to do that?
To somewhat reiterate what venu said, this depends on how one defines "unique". If you just want to get rid of exons/introns that are shared between transcripts then make a list of all exons/introns, sort it, and use
uniq
. If, on the other hand, you want to merge overlapping exons and thereby not have introns overlapping exons (possibly regardless of strand) then you'll want to either use bedtools or GenomicRanges in R.This might work for exons, but what about introns?
That's the point of merging exons within genes (or between them if that matters to you). In R that's
reduce()
, in bedtools I think you can merge something with itself.What do you mean by unique exon? Obviously each exon has a different locus. You mean you have duplicates in your GTF (exons with same locations) ?
I mean different transcripts in the file would have the same exons, and they must be filtered.