Hi to the experts,
Can I please get your help in trying to extract out genes that have at least 1 intron larger than 5 kbp? I have the GTF file and the reference genome for this species. What I want to know is, how many genes there are in the genome with at least 1 intron that has a length > 5 kbp. I have a feeling that GRanges object from the GTF file could be of use but still was unable to figure it out. I'm not sure whether I am being clear enough with my question though. Please feel free to ask questions to kindly help me find an answer for mine...
Thanks heaps, Shani.
You should have a look at this post on how to extract introns from a GTF file. From there on, you can simply use something like
awk
to filter out every intron longer than a certain length, and then extract the gene names from it. There is no need for R, everything can be done from the Unix command line. Try to play around a bit, and feel free to come back in case of questions.AFAIK, largest intron in human is less than 1 mb. You were asking about 5mb (5000 kb) intron. What is the organism?
I'm betting on a typo (== 5000bp) ... ?
Yes, you are right.... 5 kbp it is... thanks for pointing that out... this is for the barley genome though...