Hi, everyone:
Recent i am doing CNV analysis, i find a strange thing:
1.when using whole data (after removed dup, for tumor and its control), the copy number value is accurate;
2.but when downsample different fq reads from the whole fq for a certain sample (I just downsample the tumor's fq, and do not downsample its control's fq), the copy number is different from the whole fq data result, and the smaller downSample fq reads, the more difference between whole data cnv result and downsample cnv results.
Anyone ever encountered this problem? I am now very confused by CNV downsample analysis. thanks for your help.
Why are you downsampling? Decreasing read depth is going to increase noise and make things less accurate. What did you expect to see?
we are doing plasma cfDNA sequencing, the sequencing depth is very high (about 10000X-30000X), we want to determine which sequencing depth is enough to call CNV accurately, so we had sequenced some cfDNA samples with ~30000X, and then downsampling to 10000X、15000X、20000X to see which depth is ok to call cnv.
Which copy number program are you using? There are probably > 100 programs, the majority of which do not address all types of scenarios / biases. Edit: I see that German had already asked this. Can you please answer?
I do not use any thrid-party tools. i just use very simple method: use mean depth normalization to adjust the library size effect, ant calculate the normalized value (depth/mean_depth) for tumor and normal samples, sepeartely. for normal sample, i do not downsampling reads, so the normalized value for normal is fixed. the problem is that: the normalized value changes dramatically when downsampling different reads number, especially when the downsampling reads number is very small (relative to its full fq data)
That explains your finding, in that case.
en, i am not very understand, can you give some detail explan? thank you a lot.
Sorry, just to help understand better, can you perhaps show some example calculations?
Which tool do you use?
i just use mean depth normalization method to adjust sequencing library size, normalized_depth = target_depth/mean(target_depth), i find for some genes, with smaller downsampling reads, the normalized_depth becames smaller, and for some genes, with smaller downsampling reads, the normalized_depth becames bigger.
I am not sure I understand, but may be the problem you describe is called "random sampling"
yes, random sampling different number of reads from a full fq data to see which depth is enough to call cnv.
Which program are you using for the random sampling?
I use seqtk sample command
I want to known if anyone ever done the downsampling CNV analysis and encountered this problem?
I did this and I found the same and I was not surprised or confused - that's how statistics works and most of the CNV-detection tools in NGS are based on statistical methods