How To Get A List Of All The Genes On The Human Chromosome Y
4
5
Entering edit mode
14.0 years ago

A colleague of mine was asking for a list of all the genes of the human chromosome Y.

I did it through our internal gene annotated database but I was wondering what would be your favorite/easiest way to perform this task through public resources like UCSC, Ensembl or Biomart (coding solutions or clicking throuh web interfaces).

The information needed for the listing would be the Gene Symbol and the Entrez Geneid

The accepted response will be the one with highest ranking on friday noon (french time).

gene sequence retrieval identifiers • 12k views
ADD COMMENT
9
Entering edit mode
14.0 years ago
Nathan Harmston ★ 1.1k

I'd use BioMart ... its incredibly easy and since you only need a very specific form of data its pretty easy to do it with just point and click, no coding required.

Took me less than 2 minutes to get the data.

ADD COMMENT
2
Entering edit mode

Yes, I was about to post the same thing- use Biomart on the Ensembl Webpage, choose Ensembl Genes and Homo sapiens, apply a filter for Chromosome Y and click on results. You're done.

ADD REPLY
2
Entering edit mode

The other very cool think about BioMart is that you can use the URL button to store the query and give it to the requestor--they can then load that and re-run it themselves, look at other choices, try again with other regions, etc. Somewhat like "session" at UCSC.

ADD REPLY
1
Entering edit mode

from Attributes remember to check EntrezGene ID and HGNC symbol in the External box. If only all data was this easy to access.

ADD REPLY
1
Entering edit mode

well in that case ....

BioMart query

ADD REPLY
0
Entering edit mode

didn't realise that...........

BioMart query

ADD REPLY
0
Entering edit mode

I don't know if you check the result but via Biomart it display TSPY1 and TSPY2 with the same Gene ID 64591 and it is not the only case. ?!?

ADD REPLY
0
Entering edit mode

fair enough - I didn't check the data. I don't know why thats the case - I love bioinformatics databases ^^

ADD REPLY
7
Entering edit mode
14.0 years ago

Using the UCSC mysql server and the tables knownGene, knownToLocusLink and refLink:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19  -e '
select distinct
L.name,
L.locusLinkId
from
knownGene as G,
knownToLocusLink as K2L,
refLink as L
where G.name=K2L.name and
K2L.value=L.locusLinkId and
G.chrom="chrY" '

+-------------+-------------+
| name        | locusLinkId |
+-------------+-------------+
| PLCXD1      |       55344 | 
| GTPBP6      |        8225 | 
| NCRNA00107  |      283981 | 
| PPP2R3B     |       28227 | 
| SHOX        |        6473 | 
| CRLF2       |       64109 | 
| CSF2RA      |        1438 | 
| IL3RA       |        3563 | 
| SLC25A6     |         293 | 
| ASMTL-AS    |       80161 | 
| ASMTL       |        8623 | 
| P2RY8       |      286530 | 
| AKAP17A     |        8227 | 
| ASMT        |         438 | 
| ZBED1       |        9189 | 
| DHRSX       |      207063 | 
| CD99        |        4267 | 
| XGPY2       |   100132596 | 
| SRY         |        6736 | 
| RPS4Y1      |        6192 | 
| ZFY         |        7544 | 
| TGIF2LY     |       90655 | 
| PCDH11Y     |       83259 | 
| TTTY23      |      252955 | 
| TSPY2       |       64591 | 
| TTTY1B      |   100101116 | 
| TTTY2       |       60439 | 
| TTTY21      |      252953 | 
| TTTY7       |      246122 | 
| TTTY8       |       84673 | 
| AMELY       |         266 | 
| TBL1Y       |       90665 | 
| PRKY        |        5616 | 
| TTTY16      |      252948 | 
| TTTY12      |       83867 | 
| TTTY18      |      252950 | 
| TTTY19      |      252952 | 
| TTTY11      |       83866 | 
| RBMY1A3P    |      286557 | 
| TTTY20      |      252951 | 
| TSPY4       |      728395 | 
| FAM197Y2P   |      252946 | 
| TSPY3       |      728137 | 
| TSPY1       |        7258 | 
| RBMY3AP     |       64593 | 
| TTTY22      |      252954 | 
| TTTY15      |       64595 | 
| USP9Y       |        8287 | 
| DDX3Y       |        8653 | 
| UTY         |        7404 | 
| TMSB4Y      |        9087 | 
| VCY         |        9084 | 
| NLGN4Y      |       22829 | 
| FAM41AY1    |      340618 | 
| NCRNA00230B |      401629 | 
| XKRY2       |      353515 | 
| CDY2B       |      203611 | 
| XKRY        |        9082 | 
| HSFY2       |      159119 | 
| TTTY9B      |      425057 | 
| NCRNA00185  |       55410 | 
| CD24        |   100133941 | 
| TTTY14      |       83869 | 
| BCORP1      |      286554 | 
| CYorf15A    |      246126 | 
| CYorf15B    |       84663 | 
| KDM5D       |        8284 | 
| TTTY10      |      246119 | 
| EIF1AY      |        9086 | 
| RPS4Y2      |      140032 | 
| RBMY2EP     |      159125 | 
| RBMY1A1     |        5940 | 
| TTTY13      |       83868 | 
| RBMY1B      |      378948 | 
| PRY2        |      442862 | 
| TTTY6       |       84672 | 
| TTTY6B      |      441543 | 
| RBMY1E      |      378950 | 
| RBMY1J      |      378951 | 
| TTTY5       |       83863 | 
| RBMY2FP     |      159162 | 
| RBMY1F      |      159163 | 
| TTTY17B     |      474151 | 
| TTTY4C      |      474150 | 
| BPY2        |        9083 | 
| DAZ1        |        1617 | 
| DAZ2        |       57055 | 
| DAZ3        |       57054 | 
| TTTY3B      |      474148 | 
| CDY1        |        9085 | 
| CDY1B       |      253175 | 
| CSPG4P2Y    |       84664 | 
| GOLGA2P3Y   |      401634 | 
| TTTY17A     |      252949 | 
| DAZ4        |       57135 | 
| SPRY3       |       10251 | 
| VAMP7       |        6845 | 
| IL9R        |        3581 | 
+-------------+-------------+
ADD COMMENT
0
Entering edit mode

I choose Pierre's response because I am more confident with the result we get with his query. Indeed via Biomart it display TSPY1 and TSPY2 with the same Gene ID 64591 and it is not the only case.

ADD REPLY
7
Entering edit mode
14.0 years ago
Mary 11k

UCSC Table Browser would be my query of choice. In fact, at ASHG and in our workshops I usually describe the table browser as the way to get "lists of things" all the time. All the snps in your gene of interest, all the genes in a region, etc.

Note: I'll use the previous assembly for this query (hg18) because I haven't moved over to the new one for most things yet. I find a lot of what I need is still not there.

Choices by row:

  • mammal/human/Mar06
  • Genes+Predictions/ucsc genes
  • knowngene table
  • Region radio button on position, enter chrY. Click "lookup" and it will paste the range in.
  • Output format = selected fields primary + related. Get output button.

On the next page make the choices for the IDs you want. I've done: chrom, txStart, txEnd, from knowngenes table. I added kgID, swissprotID, genesymbol, refseqID, and description (because they always want description even though they don't say this...) from hg18.kgXref fields. I added acc and gi from hg18.gbCdnaInfo fields to get GenBank/EMBL accession IDs.

I'm sure I've overdone the IDs, but I like to use them as sort of internal qc checks for myself. Easily killed in the excel doc that will come next (yes, I know y'all hate excel, but it's what they want).

I've saved it as a session. I think it will store the choices. You can try to load this session and see:

From here you can click the navigation for table browser, and just move to "get output" to see my choices.

ADD COMMENT
4
Entering edit mode
14.0 years ago
Neilfws 49k

As others have answered, both UCSC tables and BioMart make this very simple. Since there are no coding solutions yet, here is one that uses BioMart via the R/Bioconductor biomaRt library:

library(biomaRt)
mart <- useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")
results <- getBM(attributes = c("chromosome_name", "entrezgene", "hgnc_symbol"),
           filters = "chromosome_name", values = "Y", mart = mart)
# count genes
dim(results)
# [1] 120   3
# list first few rows
head(results)
# chromosome_name entrezgene hgnc_symbol
# 1               Y       6736         SRY
# 2               Y       6192      RPS4Y1
# 3               Y         NA      RPS4Y1
# 4               Y       7544         ZFY
# 5               Y         NA         ZFY
# 6               Y      90655     TGIF2LY
ADD COMMENT
0
Entering edit mode

Thanks for this coding solution using Biomart. I will try to do one using pure SQL query like Pierre did for UCSC

ADD REPLY

Login before adding your answer.

Traffic: 2471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6