Question

Where Can I Download A File That Has All Ensembl Gene Ids, Transcript Ids, And Most Importantly Gene Symbols

1

Entering edit mode

10.8 years ago

hicsuntdrac0nis ▴ 250

So I'm kind of tired of always using these online conversions that have a limit for how long the input list is . . . .

Is there anywhere where I can download a file (like through UCSC Table Browser or something) to get every single transcript, gene, and gene symbol in mm10

In this format:

ENSMUSTxxxxx    [tab]    ENSMUGxxxxx    [tab]    Upf1
ENSMUSTxxxxx    [tab]    ENSMUGxxxxx    [tab]    Upf2
ENSMUSTxxxxx    [tab]    ENSMUGxxxxx    [tab]    Upf3a
ENSMUSTxxxxx    [tab]    ENSMUGxxxxx    [tab]    Upf3b
ENSMUSTxxxxx    [tab]    ENSMUGxxxxx    [tab]    Smg1

ensembl gene id conversion transcript database • 16k views

ADD COMMENT • link updated 24 months ago by Ram 44k • written 10.8 years ago by hicsuntdrac0nis ▴ 250

score 8 · Answer 1 · 2014-02-11

Yes, this is quite easy using UCSC Table Browser or the UCSC public MySQL server.

Using Table Browser, fill in the fields so as they look like this (you may want to enter a file name):

enter image description here

Then, click "get output" and link to the ensemblToGeneName table, so as the fields look like this:

enter image description here

Click "get output" again; here are the first few lines of output:

#mm10.ensGene.name    mm10.ensGene.name2    mm10.ensemblToGeneName.value
ENSMUST00000086465    ENSMUSG00000042429    Adora1
ENSMUST00000038191    ENSMUSG00000042429    Adora1
ENSMUST00000169927    ENSMUSG00000042429    Adora1
ENSMUST00000132064    ENSMUSG00000025909    Sntg1
ENSMUST00000140295    ENSMUSG00000025909    Sntg1
ENSMUST00000140302    ENSMUSG00000025909    Sntg1
ENSMUST00000115484    ENSMUSG00000025909    Sntg1
ENSMUST00000135046    ENSMUSG00000025909    Sntg1
ENSMUST00000115488    ENSMUSG00000025909    Sntg1

Neilfws · Answer 2 · 2014-02-11

5

Entering edit mode

10.8 years ago

Ashutosh Pandey 12k

In case you are comfortable with command line then you can try Neilfws's solution on command line.

mysql --user=genome -N --host=genome-mysql.cse.ucsc.edu -A -D mm10  -e "select name,name2 from ensGene" > Gene1_table
mysql --user=genome -N --host=genome-mysql.cse.ucsc.edu -A -D mm10  -e "select name,value from mm10.ensemblToGeneName" > Gene2_table
paste Gene1_table Gene2_table > mm10_ensembl.txt

ADD COMMENT • link updated 10.8 years ago by Neilfws 49k • written 10.8 years ago by Ashutosh Pandey 12k

3

Entering edit mode

Can also do a single SQL query on the 2 tables, e.g.

select ensGene.name, name2, value from ensGene, ensemblToGeneName where ensGene.name = ensemblToGeneName.name

ADD REPLY • link 10.8 years ago by Neilfws 49k

0

Entering edit mode

I tried for that but couldn't somehow make it to work. Thanks a lot.

ADD REPLY • link 10.8 years ago by Ashutosh Pandey 12k

score 2 · Answer 3 · 2014-02-11

2

Entering edit mode

10.8 years ago

Devon Ryan 104k

It's probably easiest to just use biomart. I setup an example query here. Just click on "results" in the upper left for the first 10 (there's an option to export everything to a text file).

There's also an R interface to biomart, which can be handy.

ADD COMMENT • link 10.8 years ago by Devon Ryan 104k

Ram · Answer 4 · 2015-07-02

Check out the AnnotationHub package in R/Bioconductor. This way you can easily download and access within R all sorts of annotation in just a few lines of code. See the below presentation from the recent CSAMA 15 workshop for some more detail:

http://bioconductor.org/help/course-materials/2015/CSAMA2015/lect/L15-annotation-rsrcs-morgan-demo.html

These two short YouTube clips are also a good place to start:

Cheers,
Phil

score 1 · Answer 5 · 2020-02-01

You can do that directly from the Ensembl fasta files, e.g from here. After download, do:

awk '{if ($1 ~ /^>/ ) print}' <(gzcat Homo_sapiens.GRCh38.cdna.all.fa.gz) \
| awk -F " " 'OFS="\t" {print $1, $4, $7}' \
| awk 'OFS="\t" {gsub(">","");gsub("gene:","");gsub("gene_symbol:",""); print}' > outout.tsv