Question

How to find intron co-ordinates from the gff file?

6

Entering edit mode

9.0 years ago

Chirag Parsania ★ 2.0k

Hi

Does anyone know how can I find intronic co-ordinates from the gff file ? My .gff file contains following features with co-ordinates. Unfortunately there's no intron co-ordinate

chromosome,gene,mRNA,exon,CDS,tRNA,retrotransposon,long_terminal_repeat  snoRNA,blocked_reading_frame,centromere,repeat_region,pseudogene,snRNA,ncRNA ,rRNA

Give your inputs

Thanks!

intron gff bedtools • 8.4k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 9.0 years ago by Chirag Parsania ★ 2.0k

0

Entering edit mode

Hi,

Your introns are between each exons within a given gene. They're not precisely labelled, however, one can find their begin/end positions since exons are clearly defined.

ADD REPLY • link 9.0 years ago by Thibault D. ▴ 700

0

Entering edit mode

Isn't it annoying that to find introns one has to go through a relatively complex procedure, as Devon suggests?

ADD REPLY • link 9.0 years ago by dariober 15k

0

Entering edit mode

Exactly @dariober..! I can do that in excel as well. But I wonder if somebody has already wrote some program for this. Thanks for your inputs.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 9.0 years ago by Chirag Parsania ★ 2.0k

Ram · Answer 1 · 2015-11-10

7

Entering edit mode

9.0 years ago

Devon Ryan 104k

I ended up needing to do something like this yesterday. Here is an RScript that should more or less work. Note that in my case I wanted introns by genes. If you want introns per transcript then just use intronsByTranscript(), which is much simpler.

library(GenomicFeatures)
library(rtracklayer)

gtf <- makeTxDbFromGFF("genes.gtf") #change me!
exons <- exonsBy(gtf, by="gene")

#make introns
exons <- reduce(exons)
exons <- exons[sapply(exons, length) > 1]

introns <- lapply(exons, function(x) {
    #Make a "gene" GRange object
    gr = GRanges(seqnames=seqnames(x)[1], ranges=IRanges(start=min(start(x)),
        end=max(end(x))), 
        strand=strand(x)[1])
    db = disjoin(c(x, gr))
    ints = db[countOverlaps(db, x) == 0]
    #Add an ID
    if(as.character(strand(ints)[1]) == "-") {
        ints$exon_id = c(length(ints):1)
    } else {
        ints$exon_id = c(1:length(ints))
    }
    ints
})
introns <- GRangesList(introns)

ADD COMMENT • link 9.0 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon,

Thanks for the script. I have tried to run that. Uploaded library successfully. But then I got an error that

Error: could not find function "makeTxDbFromGFF"

Here is the first few lines of session info

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rtracklayer_1.26.3     GenomicFeatures_1.18.7 AnnotationDbi_1.28.2   Biobase_2.26.0         GenomicRanges_1.18.4

Kindly tell me how can I update package and run function makeTxDbFromGFF

Chirag

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 9.0 years ago by Chirag Parsania ★ 2.0k

0

Entering edit mode

Hi Devon,

It worked.! I used makeTranscriptDbFromGFF instead of makeTxDbFromGFF. Thank you so much.

Chirag

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 9.0 years ago by Chirag Parsania ★ 2.0k

0

Entering edit mode

Hi Devon

I just found this script of yours for extracting introns...the scripts is running fine but at the last step its stucked for longer and I am not getting any output..any idea why?

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 7.7 years ago by #### ▴ 220

0

Entering edit mode

Hi @Devon I was trying your above mentioned code and I am getting following errror:

> exons
GRangesList object of length 37989:
$ENSG00000000003 
GRanges object with 20 ranges and 2 metadata columns:
       seqnames                 ranges strand |   exon_id       exon_name
          <Rle>              <IRanges>  <Rle> | <integer>     <character>
   [1]        X [100627109, 100629986]      - |    706021 ENSE00003730948
   [2]        X [100628670, 100629986]      - |    706022 ENSE00001459322
   [3]        X [100630759, 100630866]      - |    706023 ENSE00000868868
   [4]        X [100632063, 100632068]      - |    706024 ENSE00003731560
   [5]        X [100632485, 100632568]      - |    706025 ENSE00000401072
   ...      ...                    ...    ... .       ...             ...
  [16]        X [100635558, 100635746]      - |    706036 ENSE00003733424
  [17]        X [100636191, 100636689]      - |    706037 ENSE00001886883
  [18]        X [100636608, 100636806]      - |    706038 ENSE00001855382
  [19]        X [100636793, 100637104]      - |    706039 ENSE00001863395
  [20]        X [100639945, 100639991]      - |    706040 ENSE00001828996

...
<37988 more elements>
-------
seqinfo: 270 sequences (1 circular) from an unspecified genome; no seqlengths
> 
> introns <- lapply(exons, function(x) {
+     #Make a "gene" GRange object
+     gr = GRanges(seqnames=seqnames(x)[1], ranges=IRanges(start=min(start(x)),
+                                                          end=max(end(x))), 
+                  strand=strand(x)[1])
+     db = disjoin(c(x, gr))
+     ints = db[countOverlaps(db, x) == 0]
+     #Add an ID
+     if(as.character(strand(ints)[1]) == "-") {
+         ints$exon_id = c(length(ints):1)
+     } else {
+         ints$exon_id = c(1:length(ints))
+     }
+     ints
+ })
Error in .Method(..., deparse.level = deparse.level) : 
  number of columns for arg 2 do not match those of first arg
Called from: .Method(..., deparse.level = deparse.level)
Browse[1]>

Any inputs why it is so? and how to overcome this?

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 7.4 years ago by #### ▴ 220