From the table browser, select group= Genes , track= UCSC gene , table=knownGene and then 'describe table schema'
You'll see that knownGene
is linked to kgXref
:
hg18.kgXref.kgID (via knownGene.name)
and kgXref
contains a column named geneSymbol
.
All in one, you can get the positions of the transcripts for those genes:
mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e 'select distinct X.geneSymbol,G.chrom,G.txStart-5000,G.txEnd+5000 from knownGene as G, kgXref as X where X.geneSymbol in ("APOB", "TTC39B", "ATF3", "RGS1", "LIPG") and X.kgId=G.name'
+------------+-------+----------------+--------------+
| geneSymbol | chrom | G.txStart-5000 | G.txEnd+5000 |
+------------+-------+----------------+--------------+
| APOB | chr2 | 21072805 | 21125450 |
| ATF3 | chr1 | 210843616 | 210865739 |
| ATF3 | chr1 | 210800319 | 210865739 |
| ATF3 | chr1 | 210843616 | 210865704 |
| ATF3 | chr1 | 210849982 | 210864212 |
| LIPG | chr18 | 45337424 | 45378276 |
| LIPG | chr18 | 45337424 | 45367217 |
| RGS1 | chr1 | 190806479 | 190820782 |
| TTC39B | chr9 | 15156560 | 15302244 |
| TTC39B | chr9 | 15156560 | 15227442 |
| TTC39B | chr9 | 15171584 | 15302244 |
| TTC39B | chr9 | 15172968 | 15268702 |
| TTC39B | chr9 | 15172968 | 15227442 |
+------------+-------+----------------+--------------+
I agree that the information is hard to find: I only knew where to find it because I use to play regularly with those tables.
The UCSC mailing list is a good place to find this kind of information.
I also did a lot of reverse engineering by just 'greping' the flat files available from the UCSC.
UCSC does not make calls on whether there is or is not a UTR on either side. It is based on whatever the source record provides. (You may know this, but it's something people ask us in trainings all the time too, so I thought I'd mention it...)
I don't think there is any table for a gene (refGene, knownGene, ensGene, ...) where the txStart!=chromStart, and where any regulatory region in 3'/5' would have been added. You can check this within the genome browser .
Thanks a lot Pierre. I need the genomic coordinates (as in chromStart and chromEnd instead of txStart/txEnd). I can't find those fields in knownGene / kgXref table.
why do you need chromStart instead of txStart ? it is just a label after all ?
Pls. correct me if I am wrong. In this URL, description for
txStart
is given as "Transcription start position". My assumption is that a gene could have regions that are not transcribed and I will miss those region if I usetxStart
/txEnd
. I am looking at genomic coordinates for candidate gene analysis using genotype data. I need the genomic coordinates +/- 5KB(in bp) of a given gene for this particular analysis. In one of your earlier solution using UCSC MySQL you had usedchromStart
field of tablesnp130
. I am looking for that particular field here.OK, Got it. Thanks a lot for clarifying this.
Mary, Thanks for adding your thoughts. I was a bit confused with the field names txStart v/s chromStart.