Where Can I Download A File That Has All Ensembl Gene Ids, Transcript Ids, And Most Importantly Gene Symbols
5
1
Entering edit mode
10.8 years ago

So I'm kind of tired of always using these online conversions that have a limit for how long the input list is . . . .

Is there anywhere where I can download a file (like through UCSC Table Browser or something) to get every single transcript, gene, and gene symbol in mm10

In this format:

ENSMUSTxxxxx    [tab]    ENSMUGxxxxx    [tab]    Upf1
ENSMUSTxxxxx    [tab]    ENSMUGxxxxx    [tab]    Upf2
ENSMUSTxxxxx    [tab]    ENSMUGxxxxx    [tab]    Upf3a
ENSMUSTxxxxx    [tab]    ENSMUGxxxxx    [tab]    Upf3b
ENSMUSTxxxxx    [tab]    ENSMUGxxxxx    [tab]    Smg1
ensembl gene id conversion transcript database • 16k views
ADD COMMENT
8
Entering edit mode
10.8 years ago
Neilfws 49k

Yes, this is quite easy using UCSC Table Browser or the UCSC public MySQL server.

Using Table Browser, fill in the fields so as they look like this (you may want to enter a file name):

enter image description here

Then, click "get output" and link to the ensemblToGeneName table, so as the fields look like this:

enter image description here

Click "get output" again; here are the first few lines of output:

#mm10.ensGene.name    mm10.ensGene.name2    mm10.ensemblToGeneName.value
ENSMUST00000086465    ENSMUSG00000042429    Adora1
ENSMUST00000038191    ENSMUSG00000042429    Adora1
ENSMUST00000169927    ENSMUSG00000042429    Adora1
ENSMUST00000132064    ENSMUSG00000025909    Sntg1
ENSMUST00000140295    ENSMUSG00000025909    Sntg1
ENSMUST00000140302    ENSMUSG00000025909    Sntg1
ENSMUST00000115484    ENSMUSG00000025909    Sntg1
ENSMUST00000135046    ENSMUSG00000025909    Sntg1
ENSMUST00000115488    ENSMUSG00000025909    Sntg1
ADD COMMENT
1
Entering edit mode

you're a BOSS! thanks this is exactly what i was looking for

ADD REPLY
5
Entering edit mode
10.8 years ago

In case you are comfortable with command line then you can try Neilfws's solution on command line.

mysql --user=genome -N --host=genome-mysql.cse.ucsc.edu -A -D mm10  -e "select name,name2 from ensGene" > Gene1_table
mysql --user=genome -N --host=genome-mysql.cse.ucsc.edu -A -D mm10  -e "select name,value from mm10.ensemblToGeneName" > Gene2_table
paste Gene1_table Gene2_table > mm10_ensembl.txt
ADD COMMENT
3
Entering edit mode

Can also do a single SQL query on the 2 tables, e.g.

select ensGene.name, name2, value from ensGene, ensemblToGeneName where ensGene.name = ensemblToGeneName.name
ADD REPLY
0
Entering edit mode

I tried for that but couldn't somehow make it to work. Thanks a lot.

ADD REPLY
2
Entering edit mode
10.8 years ago

It's probably easiest to just use biomart. I setup an example query here. Just click on "results" in the upper left for the first 10 (there's an option to export everything to a text file).

There's also an R interface to biomart, which can be handy.

ADD COMMENT
2
Entering edit mode
9.4 years ago
phil.chapman ▴ 100

Check out the AnnotationHub package in R/Bioconductor. This way you can easily download and access within R all sorts of annotation in just a few lines of code. See the below presentation from the recent CSAMA 15 workshop for some more detail:

http://bioconductor.org/help/course-materials/2015/CSAMA2015/lect/L15-annotation-rsrcs-morgan-demo.html

These two short YouTube clips are also a good place to start:

Cheers,
Phil

ADD COMMENT
1
Entering edit mode
4.8 years ago
ATpoint 85k

You can do that directly from the Ensembl fasta files, e.g from here. After download, do:

awk '{if ($1 ~ /^>/ ) print}' <(gzcat Homo_sapiens.GRCh38.cdna.all.fa.gz) \
| awk -F " " 'OFS="\t" {print $1, $4, $7}' \
| awk 'OFS="\t" {gsub(">","");gsub("gene:","");gsub("gene_symbol:",""); print}' > outout.tsv
ADD COMMENT

Login before adding your answer.

Traffic: 1468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6