Question

GiaB SV Calls for NIST:HG001/NA12878: What's the reference build?

3

Entering edit mode

8.0 years ago

QVINTVS_FABIVS_MAXIMVS ★ 2.6k

You would think NIST/GiaB would explicitly state the reference build for their putative gold standard SV calls. But I can't see it anywhere

I'm assuming it's in GRCh37???

ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/technical/svclassify_Manuscript/Supplementary_Information/Personalis_1000_Genomes_deduplicated_deletions.bed

Anyone else knows? I can't find any information in READMEs or in the published svclassify paper.

CNV giab nist deletion • 3.9k views

ADD COMMENT • link updated 6.0 years ago by wtwhite ▴ 10 • written 8.0 years ago by QVINTVS_FABIVS_MAXIMVS ★ 2.6k

score 0 · Answer 1 · 2019-03-25

If you are referring to this (svclassify: a method to establish benchmark structural variant calls), then —yes— they aligned to NCBI's GRCh37 reference genome:

...raw reads were mapped to the National Center for Biotechnology Information (NCBI) build 37 using the Burrows-Wheeler Aligner (BWA) “bwa mem” v.0.7.5a with default parameters

Variants were mapped to human reference coordinates (NCBI build 37) by walking the read overlap graph in both directions until an “anchor” read, where a continuous 65 bps matches the reference, denoted the beginning and end of each variant.

If you are referring to the original published works (Extensive sequencing of seven human genomes to characterize benchmark reference materials), then, the same:

The sequencing data were aligned by bwa mem6 against b37 human decoy reference genome.

score 0 · Answer 2 · 2019-03-25

It looks like it's GRCh37: On p. 11 of Parikh et al. (2014) in the first paragraph of "Methods", they say that they mapped the Platinum Genomes 2x100bp HiSeq data to NCBI "build 37" using bwa mem v.0.7.5a with default parameters, and that aligned (meaning, presumably, to the same reference) BAM files were publicly available for the Illumina 250bp, PacBio and Moleculo data. Also near the top of the next page they write that the Spiral Genetics variants in category C were mapped to NCBI build 37, though it's not yet clear to me how the subheading this falls under corresponds to the 11 rows of Table 2.