This is what I have found so far. Please correct me if I am wrong.
GRCh37 w/o patches includes the primary assembly (22 autosomal, X. Y, and non-chromosomal supecontigs) and alternate scaffolds, but not a reference mitogenome. Non-chromosomal supercontigs are the unlocalized and unplaced scaffolds.
The rCRS reference mitogenome in GRCh37 was included only after patch 2 (GRCh37.p2). This patch also included some fix and novel patches.
UCSC hg19 = GRCh37 w/o patches + African Yoruba mitogenome (not rCRS).
Also UCSC hg19 has:
Different naming conventions (e.g. chromosome X: chrX in UCSC vs. X in GRC).
Different coordinate system (Start numbering a chromosome from 1 in UCSC vs. 0 in GRC).
Note also that Ion torrent uses a hg19 with replaced mitogenome (rCRS instead of Yoruba Sequence).
The b37 is hs37-1kg and does not include only the "25 longest sequences from GRCh37 (1-22,X,Y,MT)" but it is a 1000 Genome convention that includes:
-The 24 "relatively complete" chromosomal sequences (named "1" to "22", "X" and "Y") downloaded individually from ENSEMBL.
-The GRCh37.p2 (rCRS) mitochondrial sequence (named "MT") downloaded from MITOMAP or NCBI.
-The unlocalized sequences, which were named after their accession numbers, such as "GL000191.1", "GL000194.1", etc.
-The unplaced sequences, which were named after their accession numbers, such as "GL000211.1", "GL000241.1", etc.
Only the alternate loci were not included in the b37 dataset.
hs37d5 (known also as b37 + decoy) was released by The 1000 Genomes Project (Phase II), which introduced additional sequence (BAC/fosmid clones, HuRef contigs, Epstein-Barr Virus genome) to the b37 reference to help reduce false positives for mapping. Note that this one uses the primary assembly of GRCh37.p4 (not the one of GRCh37 w/o patches).
As for hs37 (without -1kg) I think it is generated only by bwakit in BWA and according to their manual it corresponds to b37+EBV (Epstein-Barr Virus genome). EBV genome is also found in hs37d5 and GRCh38 and it is included because it is used in molecular biology for transformations and because it naturally infects B cells in ~90% of the world population.
There is no hg37.
juanfdelahoz not looking for grammar correction, but can you change "hg37" to "hs37" in title and tags?
The title originally has
hs37
that I changed tohg37
. I've changed it back now.This is also an insightful piece from Heng Li:
http://lh3.github.io/2017/11/13/which-human-reference-genome-to-use