Hello, I am having some troubles using genomicFeatures R's package. Basically I am trying to retrieve for a specific gene the longest isoform and, for that isoform, exon, 3'utr and 5'utr coordinates, anyway results are strange: it seems to return 3'utr on the first exon and 5'utr on the last exon when the opposite was expected.
Here the code to replicate the results:
library(GenomicFeatures)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
# specify the gene ID
gid = '387700'
# select the longest transcript
cur_tx = data.frame(transcriptsBy(txdb, by="gene")[[as.character(gid)]])
max_tx_id = cur_tx$tx_name[which.max(cur_tx$width)]
# obtain exons for longest gene isoform
cur_exon = data.frame(exonsBy(txdb, by="tx", use.names=TRUE)[max_tx_id])
# obtain the longest 5'UTR for the selected transcript
fiveUTR = data.frame(fiveUTRsByTranscript(txdb,use.names=TRUE)[max_tx_id])
fiveUTR = fiveUTR[which.max(fiveUTR$end),]
# obtain 3'UTR for for the selected transcript
threeUTR = data.frame(threeUTRsByTranscript(txdb,use.names=TRUE)[max_tx_id])
threeUTR = threeUTR[which.max(threeUTR$end),]
Here the results:
These are the exon coordinates:
group group_name seqnames start end width strand exon_id exon_name exon_rank
1 1 uc001kgm.3 chr10 91295199 91295313 115 - 142243 <NA> 1
2 1 uc001kgm.3 chr10 91294258 91294397 140 - 142242 <NA> 2
3 1 uc001kgm.3 chr10 91222136 91222381 246 - 142241 <NA> 3
4 1 uc001kgm.3 chr10 91203513 91203616 104 - 142240 <NA> 4
5 1 uc001kgm.3 chr10 91200865 91201008 144 - 142239 <NA> 5
6 1 uc001kgm.3 chr10 91198361 91198940 580 - 142238 <NA> 6
7 1 uc001kgm.3 chr10 91195817 91196076 260 - 142236 <NA> 7
8 1 uc001kgm.3 chr10 91190051 91193083 3033 - 142235 <NA> 8
where the 8th is the first exon ( which.min(cur_exon$start) ), and the 1st is the last exon.
Anyway the 3'UTR which is expected to be found at the end of the gene is at the very beginning and vice versa for the 5'UTR (you can check easily by exon_id)
3'UTR coordinates:
group group_name seqnames start end width strand exon_id exon_name exon_rank
1 1 uc001kgm.3 chr10 91190051 91192820 2770 - 142235 <NA> 8
5'UTR coordinates:
group group_name seqnames start end width strand exon_id exon_name exon_rank
1 1 uc001kgm.3 chr10 91295199 91295313 115 - 142243 <NA> 1
Anyone know why this happen ? Am I missing something
so fast so helpful. Thx !
If this solved the problem please accept the answer (the green checkmark below the bookmark sign).