Question

Copy Number Analysis on single file

0

Entering edit mode

5.7 years ago

user31888 ▴ 150

Hi,

Is there a tool to detect copy number variants on a single target sequenced sample file?

All the tools I know (GATK, ControlFREEC, VarScan) require a mate file.

Thanks !

CNV • 3.7k views

ADD COMMENT • link updated 5.7 years ago by Kevin Blighe 89k • written 5.7 years ago by user31888 ▴ 150

0

Entering edit mode

What have you got? - a single FASTQ file relating to single-end next generation sequencing?

ADD REPLY • link 5.7 years ago by Kevin Blighe 89k

0

Entering edit mode

I have a BAM file from which I obtained 2 vcf files after calling germline and somatic variants using GATK.

ADD REPLY • link 5.7 years ago by user31888 ▴ 150

0

Entering edit mode

It seems that bcftools cnv could sort me out. I need to add to my vcf, the B-allele frequency (I can compute it from the AD field), and the LRR (don't know what it is yet),

ADD REPLY • link 5.7 years ago by user31888 ▴ 150

0

Entering edit mode

bcftools cnv does not seem to work since we need to have 2 samples to compute LRR (Log R Ratio).

ADD REPLY • link 5.7 years ago by user31888 ▴ 150

0

Entering edit mode

ControlFREEC can calculate copy number from a single sample, but what is your sample? - whole genome sequencing, exome sequencing, or target sequencing?

ADD REPLY • link 5.7 years ago by Kevin Blighe 89k

0

Entering edit mode

It is target sequencing.

ADD REPLY • link 5.7 years ago by user31888 ▴ 150

0

Entering edit mode

ControlFREEC can call copy number for this data. Here is a sample config file that you'll need for this type of data:

[general]
BedGraphOutput=TRUE
bedtools=/Programs/bedtools2/bin/bedtools
breakPointThreshold=0.8
breakPointType=2
chrFiles=FREEC/mm10/chromosomes/
chrLenFile=FREEC/mm10/mm10.fasta.fai
SNPfile=FREEC/mm10/mm10_dbSNP137.ucsc.freec.txt
coefficientOfVariation=0.06
contamination=0
contaminationAdjustment=TRUE
degree=3
forceGCcontentNormalization=1
#gemMappabilityFile=FREEC/mm10/GRCm38_68_mm10.gem
#minMappabilityPerWindow=0.85
intercept=1
minCNAlength=3
minExpectedGC=0.35
maxExpectedGC=0.55
minimalSubclonePresence=0.3
maxThreads=3
numberOfProcesses=2
noisyData=TRUE
outputDir=Out/BM25/FREEC/
ploidy=2
printNA=TRUE
readCountThreshold=10
samtools=/Programs/samtools-1.9/samtools
sex=XY
step=10000
telocentromeric=50000
#uniqueMatch=TRUE
window=50000

[sample]
inputFormat=BAM
mateFile=Out/BM25/BM25_Aligned_Sorted_PCRDuped_FiltMAPQ.bam
mateOrientation=FR

ADD REPLY • link 5.7 years ago by Kevin Blighe 89k

0

Entering edit mode

Thanks Kevin ! The program generates the sample.cnp and GC_profile.cnp files, but then stops with the following error:

Error: zero reads in windows with the GC-content around 0.35 with interval 0.01, will try again with 0.04
Error: zero reads in windows with the GC-content around 0.35 with interval 0.04
Unable to proceed
Try to rerun the program with higher number of reads

I tried to decrease the window size to 500 but I got the same error. Note, I am only interested in one chromosome (my BAM contains only reads mapped to one chromosome). I supplied a captureRegions path though. Do you have an idea what parameters should I adjust?

ADD REPLY • link 5.7 years ago by user31888 ▴ 150

0

Entering edit mode

How large is your target region?

ADD REPLY • link 5.7 years ago by Kevin Blighe 89k

0

Entering edit mode

It is about 10e6 bp.

ADD REPLY • link 5.7 years ago by user31888 ▴ 150

0

Entering edit mode

You may need to add a new chunk of code to your config file, like this:

[target]
captureRegions=TargetRegion.bed

Also, with that, set window=0 in the [general] chunk.

Be aware of the other option, readCountThreshold=10, which you may need to toggle. 10 is already low, though.

ADD REPLY • link 5.7 years ago by Kevin Blighe 89k

0

Entering edit mode

I also decreased the following parameters and still get the same error:

minExpectedGC=0
minimalSubclonePresence=0
readCountThreshold=0
window=1                    #(window=0 does not work)

ADD REPLY • link 5.7 years ago by user31888 ▴ 150

0

Entering edit mode

Alternatively, I need to calculate the approximative copy number (and exon involved) of a specific gene. Would it be possible to do it "manually" from a BAM or VCF file?

ADD REPLY • link 5.7 years ago by user31888 ▴ 150

0

Entering edit mode

Realistically, I am not confident that it is possible to accurately determine copy number for just one gene - how would you know the level of coverage that is reflective of normal, deleted, or amplified DNA, when read coverage, even in a 'normal' piece of DNA, fluctuates a lot based on GC content, sequence similarity elsewhere in the genome, etc? Even when we have genome-wide data, copy number callers disagree a lot.

Why can't you just pick a few exons, design some primers, obtain a normal reference hgDNA, and then do qPCR (determine copy number via the delta delta Ct method).