utr sequence retrieval from UCSC
0
0
Entering edit mode
7 months ago
kim ▴ 70

Hi, I'd like to retrieve 5'UTR sequence of my genes (total around 20,000) and make a text file. Here is my input file, "input.txt”

no  geneID  symbol
1   387700  SLC16A12
…   …   …
20000   3075    CFH

Basically I am trying to retrieve for a specific gene the longest isoform and, for that isoform, 5'utr coordinates. I want to make new result text  file (**"result.txt"**) like this. 

no  geneID  symbol  UCSCid  chr start   end width   strand  sequence
1   8813    DPM1    uc001kgm.3    chr10 91295199 91295313   115      - CTGAGCCACCGCGCCCTCTGCACAACCGTGTCAGGGAGAGGGAAGAGGGGGAGCCGAGGAAGCTTCTTGGCCAGGGCGCTGGGGATCCGGGAGCGGGGGCCGCCGGCGAGACTGA
387700  SLC16A12
.......
20000   3075    CFH .........

Currently, I don't know how to do it, so I am just doing one by one (1/20000....).. Looking forward to your help!

Here the code to the results:

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(GenomicFeatures)
library(BSgenome)

showMethods("getSeq")

library(BSgenome.Hsapiens.UCSC.hg19)

genome <- BSgenome.Hsapiens.UCSC.hg19
txdb = TxDb.Hsapiens.UCSC.hg19.knownGene

# specify the gene ID
gid ='387700'

# select the longest transcript
cur_tx = data.frame(transcriptsBy(txdb, by="gene")[[as.character(gid)]])
max_tx_id = cur_tx$tx_name[which.max(cur_tx$width)]

# obtain the longest 5'UTR for the selected transcript
fiveUTR = data.frame(fiveUTRsByTranscript(txdb,use.names=TRUE)[max_tx_id])
fiveUTR = fiveUTR[which.max(fiveUTR$end),]
fiveUTR
#group group_name seqnames    start      end width strand exon_id exon_name exon_rank
#  1 uc001kgm.3    chr10 91295199 91295313   115      -  142243      <NA>         1


getSeq(genome, fiveUTR$seqnames, start = fiveUTR$start, end = fiveUTR$end)
#115-letter DNAString object
#seq: CTGAGCCACCGCGCCCTCTGCACAACCGTGTCAGGGAGAGGGAAGAGGGGGAGCCGAGGAAGCTTCTTGGCCAGGGCGCTGGGGATCCGGGAGCGGGGGCCGCCGGCGAGACTGA
UTR SEQUENCING UCSC GENOME • 372 views
ADD COMMENT
0
Entering edit mode

Don't forget to follow up on your threads, that is bad etiquette. If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Have you tried to see if you can use table browser using those regions of interest?

ADD REPLY

Login before adding your answer.

Traffic: 1773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6