Does anyone know how can I find intronic co-ordinates from the gff file ? My .gff file contains following features with co-ordinates. Unfortunately there's no intron co-ordinate
Your introns are between each exons within a given gene. They're not precisely labelled, however, one can find their begin/end positions since exons are clearly defined.
I ended up needing to do something like this yesterday. Here is an RScript that should more or less work. Note that in my case I wanted introns by genes. If you want introns per transcript then just use intronsByTranscript(), which is much simpler.
I just found this script of yours for extracting introns...the scripts is running fine but at the last step its stucked for longer and I am not getting any output..any idea why?
ADD REPLY
• link
updated 4.9 years ago by
Ram
44k
•
written 7.7 years ago by
####
▴
220
0
Entering edit mode
Hi @Devon I was trying your above mentioned code and I am getting following errror:
> exons
GRangesList object of length 37989:
$ENSG00000000003
GRanges object with 20 ranges and 2 metadata columns:
seqnames ranges strand | exon_id exon_name
<Rle> <IRanges> <Rle> | <integer> <character>
[1] X [100627109, 100629986] - | 706021 ENSE00003730948
[2] X [100628670, 100629986] - | 706022 ENSE00001459322
[3] X [100630759, 100630866] - | 706023 ENSE00000868868
[4] X [100632063, 100632068] - | 706024 ENSE00003731560
[5] X [100632485, 100632568] - | 706025 ENSE00000401072
... ... ... ... . ... ...
[16] X [100635558, 100635746] - | 706036 ENSE00003733424
[17] X [100636191, 100636689] - | 706037 ENSE00001886883
[18] X [100636608, 100636806] - | 706038 ENSE00001855382
[19] X [100636793, 100637104] - | 706039 ENSE00001863395
[20] X [100639945, 100639991] - | 706040 ENSE00001828996
...
<37988 more elements>
-------
seqinfo: 270 sequences (1 circular) from an unspecified genome; no seqlengths
>
> introns <- lapply(exons, function(x) {
+ #Make a "gene" GRange object
+ gr = GRanges(seqnames=seqnames(x)[1], ranges=IRanges(start=min(start(x)),
+ end=max(end(x))),
+ strand=strand(x)[1])
+ db = disjoin(c(x, gr))
+ ints = db[countOverlaps(db, x) == 0]
+ #Add an ID
+ if(as.character(strand(ints)[1]) == "-") {
+ ints$exon_id = c(length(ints):1)
+ } else {
+ ints$exon_id = c(1:length(ints))
+ }
+ ints
+ })
Error in .Method(..., deparse.level = deparse.level) :
number of columns for arg 2 do not match those of first arg
Called from: .Method(..., deparse.level = deparse.level)
Browse[1]>
Any inputs why it is so? and how to overcome this?
ADD REPLY
• link
updated 4.9 years ago by
Ram
44k
•
written 7.4 years ago by
####
▴
220
Hi,
Your introns are between each exons within a given gene. They're not precisely labelled, however, one can find their begin/end positions since exons are clearly defined.
Isn't it annoying that to find introns one has to go through a relatively complex procedure, as Devon suggests?
Exactly @dariober..! I can do that in excel as well. But I wonder if somebody has already wrote some program for this. Thanks for your inputs.