Get A File To Cross Reference Ensembl Gene Ids And Alias
2
3
Entering edit mode
14.7 years ago
D W ▴ 150

I have a list of Ensembl gene IDs. I would like a file to cross reference them to aliases from kgAlias.

Preferably it would be some kind of mysql statements with a join of of some kind:

mysql -N -h  genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e 'select name,name2,exonStarts,exonEnds from ensGene'

UPDATE: Pierre answered this question:

mysql -N -h  genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e '\
 select QUERY.name,QUERY.name2,QUERY.geneSymbol from
 (select X.*,\
 G.* from ensGene as G,\
 knownToEnsembl as KE,\
 kgXref as X where\
 G.name=KE.value and KE.name=X.kgID)\
 as QUERY'
conversion mysql ensembl • 4.4k views
ADD COMMENT
4
Entering edit mode
14.7 years ago

From the table Browser ("describe table schema"), you get:

For ensGene:

hg19.knownToEnsembl.value (via ensGene.name)

From knownToEnsembl

hg19.kgXref.kgID (via knownToEnsembl.name)

All in one, your request would be:

mysql -N -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18 -e 'select X.*,G.* from ensGene as G,knownToEnsembl as KE, kgXref as X where G.name=KE.value and KE.name=X.kgID limit 10\G'
*************************** 1. row ***************************
kgID: uc004fqd.1
mRNA: Z48511
spID:
spDisplayID:
geneSymbol: AYP1p1
refseq:
protAcc:
description: Homo sapiens AYP1 pseudogene 1 (AYP1p1), non-coding RNA.
bin: 605
name: ENST00000402072
chrom: chrY
strand: +
txStart: 2717867
txEnd: 2718369
cdsStart: 2718369
cdsEnd: 2718369
exonCount: 1
exonStarts: 2717867,
exonEnds: 2718369,
score: 0
name2: ENSG00000220324
cdsStartStat: none
cdsEndStat: none
exonFrames: -1,
*************************** 2. row ***************************
kgID: uc004fqe.1
mRNA: Z48515
spID:
spDisplayID:
geneSymbol: RPS4Y1
refseq:
protAcc:
description: Homo sapiens PRO2646 mRNA, complete cds.
bin: 606
name: ENST00000250784
chrom: chrY
strand: +
txStart: 2769622
txEnd: 2794995
cdsStart: 2769665
cdsEnd: 2794935
exonCount: 7
exonStarts: 2769622,2770205,2772117,2773686,2782640,2793128,2794833,
exonEnds: 2769668,2770283,2772298,2773784,2782812,2793286,2794995,
score: 0
name2: ENSG00000129824
cdsStartStat: cmpl
cdsEndStat: cmpl
exonFrames: 0,0,0,1,0,1,0,
*************************** 3. row ***************************
kgID: uc004fqf.1
mRNA: Z48516
spID:
spDisplayID:
geneSymbol: RPS4Y1
refseq:
protAcc:
description: Homo sapiens PRO2646 mRNA, complete cds.
bin: 606
name: ENST00000250784
chrom: chrY
strand: +
txStart: 2769622
txEnd: 2794995
cdsStart: 2769665
cdsEnd: 2794935
exonCount: 7
exonStarts: 2769622,2770205,2772117,2773686,2782640,2793128,2794833,
exonEnds: 2769668,2770283,2772298,2773784,2782812,2793286,2794995,
score: 0
name2: ENSG00000129824
cdsStartStat: cmpl
cdsEndStat: cmpl
exonFrames: 0,0,0,1,0,1,0,
*************************** 4. row ***************************
kgID: uc004fqh.1
mRNA: NR_002820
spID:
spDisplayID:
geneSymbol: AYP1p1
refseq: NR_002820
protAcc:
description: Homo sapiens AYP1 pseudogene 1 (AYP1p1), non-coding RNA.
bin: 605
name: ENST00000402072
chrom: chrY
strand: +
txStart: 2717867
txEnd: 2718369
cdsStart: 2718369
cdsEnd: 2718369
exonCount: 1
exonStarts: 2717867,
exonEnds: 2718369,
score: 0
name2: ENSG00000220324
cdsStartStat: none
cdsEndStat: none
exonFrames: -1,
*************************** 5. row ***************************
kgID: uc004fqi.1
mRNA: NM_001008
spID: P22090
spDisplayID: RS4Y1_HUMAN
geneSymbol: RPS4Y1
refseq: NM_001008
protAcc: NP_000999
description: ribosomal protein S4, Y-linked 1 Y isoform
bin: 606
name: ENST00000250784
chrom: chrY
strand: +
txStart: 2769622
txEnd: 2794995
cdsStart: 2769665
cdsEnd: 2794935
exonCount: 7
exonStarts: 2769622,2770205,2772117,2773686,2782640,2793128,2794833,
exonEnds: 2769668,2770283,2772298,2773784,2782812,2793286,2794995,
score: 0
name2: ENSG00000129824
cdsStartStat: cmpl
cdsEndStat: cmpl
exonFrames: 0,0,0,1,0,1,0,
*************************** 6. row ***************************
kgID: uc010nwd.1
mRNA: AK310553
spID:
spDisplayID:
geneSymbol: ZFY
refseq: NM_003411
protAcc: NP_003402
description: zinc finger protein, Y-linked isoform 1
bin: 75
name: ENST00000383052
chrom: chrY
strand: +
txStart: 2863321
txEnd: 2910547
cdsStart: 2881977
cdsEnd: 2908034
exonCount: 8
exonStarts: 2863321,2881949,2889114,2903135,2903551,2904710,2905980,2906850,
exonEnds: 2863487,2882038,2889687,2903285,2903695,2904863,2906121,2910547,
score: 0
name2: ENSG00000067646
cdsStartStat: cmpl
cdsEndStat: cmpl
exonFrames: -1,0,1,1,1,1,1,1,
*************************** 7. row ***************************
kgID: uc004fqj.1
mRNA: NM_003411
spID: P08048
spDisplayID: ZFY_HUMAN
geneSymbol: ZFY
refseq: NM_003411
protAcc: NP_003402
description: zinc finger protein, Y-linked isoform 1
bin: 75
name: ENST00000155093
chrom: chrY
strand: +
txStart: 2863545
txEnd: 2909891
cdsStart: 2881977
cdsEnd: 2908034
exonCount: 8
exonStarts: 2863545,2881949,2889114,2903135,2903551,2904710,2905980,2906850,
exonEnds: 2863810,2882038,2889687,2903285,2903695,2904863,2906121,2909891,
score: 0
name2: ENSG00000067646
cdsStartStat: cmpl
cdsEndStat: cmpl
exonFrames: -1,0,1,1,1,1,1,1,
*************************** 8. row ***************************
kgID: uc010nwe.1
mRNA: BC114526
spID: Q24JR0
spDisplayID: Q24JR0_HUMAN
geneSymbol: ZFY
refseq: NM_003411
protAcc: NP_003402
description: zinc finger protein, Y-linked isoform 1
bin: 75
name: ENST00000155093
chrom: chrY
strand: +
txStart: 2863545
txEnd: 2909891
cdsStart: 2881977
cdsEnd: 2908034
exonCount: 8
exonStarts: 2863545,2881949,2889114,2903135,2903551,2904710,2905980,2906850,
exonEnds: 2863810,2882038,2889687,2903285,2903695,2904863,2906121,2909891,
score: 0
name2: ENSG00000067646
cdsStartStat: cmpl
cdsEndStat: cmpl
exonFrames: -1,0,1,1,1,1,1,1,
*************************** 9. row ***************************
kgID: uc004fqk.1
mRNA: NM_139214
spID: Q8IUE0
spDisplayID: TF2LY_HUMAN
geneSymbol: TGIF2LY
refseq: NM_139214
protAcc: NP_631960
description: TGFB-induced factor homeobox 2-like, Y-linked
bin: 611
name: ENST00000321217
chrom: chrY
strand: +
txStart: 3507125
txEnd: 3508080
cdsStart: 3507285
cdsEnd: 3507843
exonCount: 2
exonStarts: 3507125,3507264,
exonEnds: 3507168,3508080,
score: 0
name2: ENSG00000176679
cdsStartStat: cmpl
cdsEndStat: cmpl
exonFrames: -1,0,
*************************** 10. row ***************************
kgID: uc010nwf.1
mRNA: CCDS14775
spID: Q8IUE0
spDisplayID: TF2LY_HUMAN
geneSymbol: TGIF2LY
refseq: NM_139214
protAcc: NP_631960
description: TGFB-induced factor homeobox 2-like, Y-linked
bin: 611
name: ENST00000321217
chrom: chrY
strand: +
txStart: 3507125
txEnd: 3508080
cdsStart: 3507285
cdsEnd: 3507843
exonCount: 2
exonStarts: 3507125,3507264,
exonEnds: 3507168,3508080,
score: 0
name2: ENSG00000176679
cdsStartStat: cmpl
cdsEndStat: cmpl
exonFrames: -1,0,

ADD COMMENT
0
Entering edit mode

Thanks very much. May I ask, how can I restrict the results to output only the name (ex.ENST00000250784), name2 (ex. ENSG00000129824), and geneSymbol (ex. RPS4Y1)?

ADD REPLY
0
Entering edit mode

add ... ' and G.name="ENST00000250784" and X.geneSymbol="RPS4Y1" ' at the end of the query. See. http://dev.mysql.com/doc/refman/5.0/en/select.html

ADD REPLY
0
Entering edit mode

thanks for the link

ADD REPLY
1
Entering edit mode
14.7 years ago
D W ▴ 150

Pierre's solution was what I was looking for. But just for completeness you can get this from the web as well.

  • GOTO: http://www.biomart.org/biomart/martview/
  • Select database: Ensembl Genes Select
  • dataset: Homo Sapiens Click
  • Attributes Exapand: External
  • References Check: HGNC automatic gene
  • name Click Top Button: Results
  • Select output desired

http://www.biomart.org/biomart/martview/6a219783b4cc3e15820e9ea9f654e555

ADD COMMENT

Login before adding your answer.

Traffic: 3031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6