EnsGene.txt (UCSC/hg19) and Homo_sapiens.GRCh38.76.gtf (GRCH38) positions do not match
0
0
Entering edit mode
10.2 years ago
pwg46 ▴ 540

I understand that UCSC/hg19 positions are 0-based whereas GRCh38 positions are 1-indexed. However, when comparing feature positions on the hg19 ensGene.txt file with the same features on Homo_sapiens.GRCh38.76.gtf, the positions were completely off. For example, if you try picking any protein_coding transcript from the GRCh38 gtf file and compare its start/end positions, exon start/end positions, CDS positions, etc. with its positions on the ensGene.txt file, the positions are often off by a few thousand. I have also checked the gtf file in GRCh37 (which should be identical to hg19), but the positions were again way off. Can anyone explain why this is?

ucsc gtf match grch38 position • 5.7k views
ADD COMMENT
3
Entering edit mode

hg19 == GRCh37

hg19 != GRCh38

ADD REPLY
1
Entering edit mode

Given that you knew that hg19 is GRCh37 and not GRCh38, I'm confused why you're confused

ADD REPLY
1
Entering edit mode

In the second part are you comparing GRCh37 to hg19 or GRCh37 to GRCh38? If the former I'm not sure why they would be off, if the later it is for the same reason as GRCh38 vs hg19..... GRCh38 is a completely different assembly of the reference human genome. Its size is different, the chromosomes are different, etc. You can always only compare coordinates within an assembly, not between assemblies.

ADD REPLY

Login before adding your answer.

Traffic: 2597 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6