Getting Ucsc Headers For Tables Through Ftp Or Via Sql?
1
0
Entering edit mode
11.3 years ago
user ▴ 950

How can I fetch the headers of UCSC gene tables programmatically from ftp? different genomes have different headers. Example: kgXref.txt from hg18 (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/kgXref.sql) has a different schema from the goldenPath equivalent to kgXref for hg19. Since http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/kgXref.txt has no header it's difficult to know what the headers are - is there an easy way to get that? does one have to parse the .sql file to get that info? if so, what are some tools to get the schema/headers out of the .sql file, which is otherwise cumbersome to parse? thank you

the solution provided by pierre is the answer.

ucsc genome-browser annotation genes • 2.8k views
ADD COMMENT
1
Entering edit mode
11.3 years ago

you can use the public mysql server of the UCSC and the DESC statement:

 mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'desc knownGene'

to get a diff of the column names:

$ sdiff <(mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg18 -e 'desc kgXref' -N | cut -d '       ' -f 1) <(mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'desc kgXref' -N | cut -d '   ' -f 1)
kgID                                kgID
mRNA                                mRNA
spID                                spID
spDisplayID                            spDisplayID
geneSymbol                            geneSymbol
refseq                                refseq
protAcc                                protAcc
description                            description
                                  >    rfamAcc
                                  >    tRnaName
ADD COMMENT
0
Entering edit mode

great solution! minor question - is there a way to get the desc output in a format more standard, like csv/tsb, rather than this pretty-printed table which is hard to parse? scratch figured it out, the solution is to add -N -B to the query.

ADD REPLY
1
Entering edit mode

try : mysqldump --user=genome --host=genome-mysql.cse.ucsc.edu -X --skip-lock-tables -d hg19 kgXref

ADD REPLY

Login before adding your answer.

Traffic: 1514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6