get genomic coordinates plus 5kb from Gene Symbols
2
0
Entering edit mode
10.2 years ago
jfertaj ▴ 110

Hi list,

I am trying to use RMySQL to get genomic coordinates from a bunch of gene symbols and add 5kb to the TSS afterwards. I know I can do it using Tables from UCSC browser or BiomaRt but I would like to learn how to query using RMySQL or mysql,

I have followed this post

#Install the package if you've never done so
install.packages("RMySQL")
#Load the package
library(RMySQL)
# Set up a connection to your database management system.
# I'm using the public MySQL server for the UCSC genome browser (no password)
mychannel <- dbConnect(MySQL(), user="genome", host="genome-mysql.cse.ucsc.edu")
# Function to make it easier to query
query <- function(...) dbGetQuery(mychannel, ...)
# Get the UCSC gene name, start and end sites for the first 10 genes on Chromosome 12
query("SELECT name, chrom, txStart, txEnd FROM mm9.knownGene WHERE chrom='chr12' LIMIT 10;")
# Results are returned as a data.frame:
# name chrom txStart txEnd
# 1 uc007mwj.2 chr12 3235525 3250374
# 2 uc007mwg.2 chr12 3235790 3239112
# 3 uc007mwh.2 chr12 3235790 3239288
# 4 uc007mwi.2 chr12 3235790 3250374
# 5 uc007mwk.1 chr12 3236610 3249997
# 6 uc011yjq.1 chr12 3237284 3241410
# 7 uc007mwl.2 chr12 3247427 3309969
# 8 uc007mwm.1 chr12 3365131 3406494
# 9 uc007mwn.1 chr12 3365131 3406494
# 10 uc007mwp.2 chr12 3403882 3426747
view raw rmysql.r hosted with ❤ by GitHub
but I cannot figure out how to do it with geneSymbols (HUGO names) because name in Knowngene table is different from kgXref table

Any help would it be appreciated

Thanks

genome mysql RMySQL • 2.3k views
ADD COMMENT
2
Entering edit mode
10.2 years ago
Ram 45k

Use a join with knownGene.name=kgXref.kgID

You can always use the Table Browser to check the schema of UCSC's MySQL tables to guess which identifiers might match. These tables are highly redundant to help querying, mapping and display.

ADD COMMENT
0
Entering edit mode
10.2 years ago
Chirag Nepal ★ 2.4k

If interested, another alternative way is to use one-liner:

Extract 5KB region around TSS

cat input.bed | awk 'BEGIN {OFS="\t"}  { if ($6 == "+") { print $1,$2-5000,$2+5000,$4,$5,$6 } else if ($6 == "-") { print $1,$3-5000,$3+5000,$4,$5,$6 } > Temp

If you want sequence of these: used bedtools

fastaFromBed -s -name -fi assembly.fa -bed Temp -fo Temp.fa
ADD COMMENT

Login before adding your answer.

Traffic: 2991 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6