I am trying to get the chromosome position of a query gene using UCSC mysql commands (in hg19 genome).
I cannot find any documentation or tutorial about the syntax of the program to be used.
Any idea?
I am trying to get the chromosome position of a query gene using UCSC mysql commands (in hg19 genome).
I cannot find any documentation or tutorial about the syntax of the program to be used.
Any idea?
Example:
$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -N -e "SELECT k.chrom, kg.txStart, kg.txEnd, x.geneSymbol FROM knownCanonical k, knownGene kg, kgXref x WHERE k.transcript = x.kgID AND k.transcript = kg.name AND x.geneSymbol LIKE 'CTCF';" > CTCF.bed
The chromosome and positions of CTCF in hg19
are in the first three columns of the unsorted BED file result.
A gene can have multiple transcripts, so you can get more than one record for a given HGNC gene name.
This result relies on three tables in the UCSC Genome Browser for database hg19
called: knownCanonical
, knownGene
and kgXref
.
The schema of knownCanonical
is located here: http://genome.ucsc.edu/goldenpath/gbdDescriptionsOld.html#KnownCanonical
The schema of knownGene
is located here: http://genome.ucsc.edu/cgi-bin/hgTables?db=hg19&hgta_group=genes&hgta_track=knownGene&hgta_table=knownGene&hgta_doSchema=describe+table+schema
Likewise, the schema of kgXref
is located here: http://genome.ucsc.edu/goldenpath/gbdDescriptionsOld.html#KgXref
The rest is just a SQL query based on the schemas of the three tables, along with database and host parameters that are specific to UCSC.
Part of the "magic" is knowing what tables and fields to use. This comes from experience with the Genome Browser and exploring the links to schemes that are usually available from the table description pages on the Genome Browser site, as well as scouring discussion threads and asking UCSC mailing lists directly, when that information is difficult to find, or seems to be unavailable.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks Alex !
Do you know where I could find any documentation about the syntax of the command you used (specially for the '-e' argument)?
Please see the edit.
Thanks Alex for the links!
But the command line doesn't work (it is running indefinitely).
Works for me. Maybe their server is slow at the moment? You might check with the UCSC Genome Browser mailing list.