Where can I get the bam files used in cnvkit examples cited in https://github.com/etal/cnvkit-examples
I did like to compare another cnv caller's performance against cnvkit -Lax
Where can I get the bam files used in cnvkit examples cited in https://github.com/etal/cnvkit-examples
I did like to compare another cnv caller's performance against cnvkit -Lax
The test samples are from Shain et al. 2015, Nature Genetics. Since these sequences are protected patient information the BAM files were submitted to dbGaP; there is a few months' delay before they appear online. However, those weren't ideal samples for testing copy number calling anyway -- desmoplastic melanoma genomes are dominated by somatic SNVs, not large-scale copy number alterations.
A better dataset for testing variant callers, both SNV and CNV, has become available recently: "An open access pilot freely sharing cancer genomic data from participants in Texas". I recommend running your benchmarks with these samples instead so that you can freely share your complete analysis. CNVkit has changed significantly since the version I benchmarked in the paper, so in any case you'll need to re-run the latest version of each caller (including CNVkit) to get representative results.
The texas site is perfect. However it is WES. Anyone know of a good illumina amplicon cnv dataset?
I don't know of any that are publicly available, but try SRA or dbGaP. Targeted amplicon sequencing seems to be focused in smaller clinics where making the sequencing data widely available (e.g. IRB approval) is not the primary concern; bigger studies that are conducted with this intent are usually WES, WGS or at least hybrid capture with a broader panel. But if you find a good public TAS dataset, please let me know!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can CNVKit be used for copy number germline mutation detection?
Yes, but the resolution is fairly coarse, especially on target panels, so detection of smaller CNVs (e.g. below 1Mb) is be less accurate. This is less of a concern in cancer cases where somatic copy number alterations tend to affect entire genes or larger chromosomal regions.
For germline cases, if you only have targeted/exome sequencing data then CNVkit is worth running to get some copy number information, but a clinic with access to the original sample or extracted DNA should consider running another assay (e.g. SNP array, FISH, qPCR) in parallel if possible.