Download all hg19 coding sequences from UCSC
1
How to download all human coding sequences from UCSC table browser. The resulting format that we want to send to Galaxy is "gene ID, CDS in fasta".
human-genome
galaxy
cds
• 4.2k views
•
link
updated 2.7 years ago by
Ram
44k
•
written 10.6 years ago by
Daniel
▴
40
on the bash command line (assuming you have a mysql client installed):
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 \
-N -e 'SELECT * FROM knownGeneMrna' | sed -e 's/^/>/' -e 's/\s/\n/' > myFastaFile.fa
This will take a while to run. Make sure the output is what you want by sticking a LIMIT 10
in your SQL query:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 \
-N -e 'SELECT * FROM knownGeneMrna LIMIT 10' | sed -e 's/^/>/' -e 's/\s/\n/' > myFastaTestFile.fa
•
link
updated 4.9 years ago by
Ram
44k
•
written 10.6 years ago by
Dan D
7.4k
Login before adding your answer.
Thanks. Any way to do it through the table browser web interface?
If you're willing to use the Ensembl annotation then you can just use Biomart.
Doesn't this just download knownGeneMrna? These aren't just CDSs but include UTRs. Is there a way to get the CDS?
There's a "coding sequence" option.