Difference between various GRCH 37 releases?
1
2
Entering edit mode
10.0 years ago

I have some Genomic coordinates based on hg19 which is equal to GRCH37 and I would like to analyze these coordinates using actual genome sequences. In Ensembl, There are several releases of Grch like GRCH37.70 and GRCH37.75 etc. I am wondering what is the actual difference between all these releases of GRCH37 ?! Are the coordinates the same for all these releases?!

genome • 9.2k views
ADD COMMENT
2
5
Entering edit mode
10.0 years ago
Ying W ★ 4.3k

This is the website of the Genome Reference Consortium, might be informative for you to poke around there

hg19 is typically thought of as a subset of GRCh37 with the GRCh37p# representing different minor / patch releases (changes do not affect coordinates)

Ensembl uses a different versioning system, you can find their list of changes here: http://www.ensembl.org/Help/ArchiveList though it seems like the latest release is mostly schema changes with some changes to annotations.

ADD COMMENT
0
Entering edit mode

Just to clarify further, versioning in Ensembl is just the annotation of the genes themselves. The primary assembly (ie coordinate system) is the same for anything that's GRCh37. A p.# will represent patches on top of the genome – things are added but the existing primary assembly is not altered in any way.

ADD REPLY
0
Entering edit mode

Hi Emily, a question regarding your last sentence: how can the primary assembly coordinates remain unaltered if you ADD something? let's say in the middle of chromosome 17 there is a sequence which contains 4 repeats and not 3, as previously thought. Thus I publish a patch and now there are 4 repeats instead of only 3. I add one more repeat. Of course that changes every coordinate ... ?

ADD REPLY
0
Entering edit mode

The three repeats are still there in the primary assembly. There's just a different version that you can (and should) instead for that locus which has four repeats.

ADD REPLY
0
Entering edit mode

I don't understand this ... let's say in p6 there are 3 repeats on chromosome 17. Then in p7 there are 4 repeats. In the fasta file the sequence "> chr17" will thus be 1 repeat longer and thus all the nucleotides after that additional repeat will have a changed coordinate? You say that the three repeats are "still there" -> well of course they are still there, but there's a fourth one now, right? So the sequence becomes longer? What am I missing?

ADD REPLY
1
Entering edit mode

Hello Marvin ,

have a look at my tutorial Which human reference genome should I use? . I tried to explain there what patches are. In short: If there is a fix/update for an existing sequence, this is not incorporated into the orginal sequence. It's getting it's own name/accession number and is included to the collection of sequences that build the reference genome.

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 1797 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6