Ucsc Mysql Query
2
0
Entering edit mode
13.4 years ago

Hi everyone,

I have to deal with task over UCSC genome browser but I want to do it via MySQL public server connection, I am using Ubuntu and I installed MySQL Client and already connected to UCSC, but once I showed the tables description I was a little bit lost, here is what I want to do

1. Click the "Tables" link on the main page genome.ucsc.edu) or in
    the blue bar at the top of the genome browser page.
2. Select the organism (Human/Chimp) and assembly date of interest.
3. Select "Genes and Gene Prediction Tracks" as the group and one
    of the genes tracks (e.g. Known Genes or RefSeq genes).
4. Select genome as the region.
5. Select "sequence" as the output format.
6. If desired -- enter a filename where the data should be saved
    locally and compress with gzip.
7. Click the "Get Output" button.
8. Select genomic and click "Submit".
9. Select only 5' UTR (or 3' UTR) Exons and click "Get Sequence".

How can I convert this small process into an SQL query ?

Rad

ucsc • 8.6k views
ADD COMMENT
1
Entering edit mode

@Rahouane: you've already asked more than 20 questions on this site but you didn't bother to validate most of the people's answers. Please, take a few minutes to validate the answers, thanks.

ADD REPLY
5
Entering edit mode
13.4 years ago

you cannot retrieve a genomic sequence of DNA only with a SQL query. You can retrieve all the the 5' and 3' UTR using the following command:

mysql -h genome-mysql.cse.ucsc.edu -u genome -D hg19 -N -A -e 'select chrom,strand, txStart,cdsStart from knownGene where txStart< cdsStart union select chrom,strand,cdsEnd,txEnd from knownGene where cdsEnd&lt; txEnd ' > utrs.txt

and then loop over those regions to get the genomic sequences: see How To Get The Sequence Of A Genomic Region From Ucsc?

ADD COMMENT
0
Entering edit mode

Pierre thank you for your answer : I want only the 3' UTR how to modify the query, can u explain a little bit by splitting it and commenting it ? I appreciate Rad

ADD REPLY
0
Entering edit mode

Got it pierre I understood the query, thank you

ADD REPLY
0
Entering edit mode

you can now validate this answer.... :-)

ADD REPLY
3
Entering edit mode
12.0 years ago
brentp 24k

You can do this with cruzdb. The following will print out a fasta file with the 3' utr sequences:

from cruzdb import Genome, sequence
database = 'hg19'

g = Genome(database)
refGene = g.refGene

tbl = refGene.table()
query = refGene.filter(tbl.c.cdsEnd < tbl.c.txEnd, tbl.c.cdsStart != tbl.c.cdsEnd)

for transcript in query:
    start, end = transcript.utr3
    if None in (start, end): continue
    print ">%s\n%s" % (transcript.name2, sequence.sequence(database,
        transcript.chrom, start, end))
ADD COMMENT
0
Entering edit mode

Now it gave me error:

In[8]: tbl = refGene.table()
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2820, in run_code
    exec code_obj in self.user_global_ns, self.user_ns
  File "<ipython-input-8-17e93216ae17>", line 1, in <module>
    table = refGene.table()
  File "/usr/local/lib/python2.7/dist-packages/cruzdb/sqlsoup.py", line 78, in __getattr__
    return getattr(cls._query, attr)
AttributeError: 'Query' object has no attribute 'table'
ADD REPLY

Login before adding your answer.

Traffic: 2524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6