Gold standard CNV Calls for NA12878
2
1
Entering edit mode
5.6 years ago
rse ▴ 100

Hi,

Is there any gold standard CNV calls for the sample NA12878 (WES data)?

Thank you

next-gen • 6.0k views
ADD COMMENT
0
Entering edit mode

Like you, I also want this kind of data.

ADD REPLY
0
Entering edit mode

Thanks a lot everyone for your valuable response.

ADD REPLY
1
Entering edit mode
5.5 years ago
Garan ▴ 690

Have you tried some of the Genome In A Bottle resources?

https://jimb.stanford.edu/giab-resources

EDIT Recent paper by the GIAB group https://www.biorxiv.org/content/10.1101/664623v1

Mt. Sinai School of Medicine has uploaded ~44x PacBio data for NA12878, including raw reads, error-corrected reads, and a merged SV vcf to: ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/NA12878_PacBio_MtSinai

NA12878.sorted.vcf.gz - VCF file of structural variants.

This file was produced using three different variant detection methods: PBHoney, a custom pipeline and (on assembled sequences) the Chaisson et al. methodology from "Resolving the complexity of the human genome using single-molecule sequencing" (Nature, 2015). Structural variations were classified simply as either a deletion or insertion. Each entry in this file is a member of the superset of all events called by all variant callers. Because calls were accepted based on the number of different variant callers that supported the event in question, the QUAL field is currently not populated. If an event is predicted by at least 3 different variant callers, the FILTER field entry is simply PASS otherwise it is listed as lt3. While each of the methods used is imprecise on boundary calling by design, for each event, the most precise boundary definitions are indicated in the CIPOS field (and CIEND for deletion events). The NS field is a bit vector describing which caller configurations supported the event, Each position represents the conclusion of one caller in this order:

 1. PBHoney/rawReads/blasr1.3.1 
 2. custom/rawReads/blasr1.3.1
 3. PBHoney+ECReads+blasr1.3.1 
 4. custom+ECReads+blasr1.3.1
 5. assembly
 6. custom+ECReads+blasr1.3.2
 7. custom+rawReads+blasr1.3.2
  

A bit vector of 0101111 for example, meant that the event was supported by the second, fourth, fifth, sixth and seventh combination of mappers and event callers in that list. Note, if an assembly based call (5 column) is not present then the SV pos (and end) reflect a buffer (50bp for custom pipeline and 100bp for PBHoney); note, this buffer is also added (times 2) to the SVLEN field.

There are two additional flags included in the INFO field:

   BL - An event which was not called by a method using blasr
      v1.3.1 which WAS subsequently called using the same variant
      caller paired with blasr 1.3.2.

  ZU - An event for which phasing was not possible or for
      which zygosity could not be reliably determined. The
      genotype for these events in NA12878 is listed as 0/1 by
      default.  Note, because the phasing approach used our
      knowledge of raw reads, variants calling from assembly were
      NOT attempted for phasing analysis.
  

The genotype field is populated based on spanning raw reads. In the case of no reads supporting the reference a column is marked as homozygous (1/1); in the case of a raw reads spanning an SV which intersected a proximal phased SNP, it is indicated as either paternal (1|0) or maternal (0|1). Here, there are two circumstances where the genotype 0/1 is employed. The first is if the variant is heterozygous but the phase could not be determined. The second is when the zygosity was not assessed (assembly). In both cases, the genotype is accompanied by a ZU flag in the entry's INFO field.

ADD COMMENT
0
Entering edit mode
5.6 years ago

I mean, most cell lines have karyotyping done, which may not catch all focal CNVs, but captures all of the big ones.

The DGV database is also pretty useful for gold standard variants, but not necessarily for GM12878.

ADD COMMENT

Login before adding your answer.

Traffic: 1811 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6