Entering edit mode
5.2 years ago
yliueagle
▴
290
I retrieved gene cordinates from TxDb.Hsapiens.UCSC.hg19.knownGene using:
gene_info = genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
But I found some of the genes, for example, PTPN20 (id: 26095) is hugely different from NCBI gene bank. It is error information or I am doing something wrong?
subset(gene_info, gene_id=='26095')
GRanges object with 1 range and 1 metadata column:
seqnames ranges strand | gene_id
<Rle> <IRanges> <Rle> | <character>
26095 chr10 46550123-48827924 - | 26095
The gene coordinates I got from gene bank: https://www.ncbi.nlm.nih.gov/gene/?term=26095
It looks like this gene was annotated in two places in
hg19
build (http://uswest.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000204179;r=10:46911396-47002488 look inother assemblies
). This seems to have been resolved in hg38, where only one copy is annotated.https://www.ncbi.nlm.nih.gov/gene/26095
https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:23423
Adding on this, see hg19: