Question

copy number became inaccurate when downsampling different number fastq reads from a whole fastq file

0

Entering edit mode

5.2 years ago

lffu_0032 ▴ 90

Hi, everyone:

Recent i am doing CNV analysis, i find a strange thing:

1.when using whole data (after removed dup, for tumor and its control), the copy number value is accurate;

2.but when downsample different fq reads from the whole fq for a certain sample (I just downsample the tumor's fq, and do not downsample its control's fq), the copy number is different from the whole fq data result, and the smaller downSample fq reads, the more difference between whole data cnv result and downsample cnv results.

Anyone ever encountered this problem? I am now very confused by CNV downsample analysis. thanks for your help.

CNV DownSample fastq copy number not accurate • 2.2k views

ADD COMMENT • link 5.2 years ago by lffu_0032 ▴ 90

1

Entering edit mode

Why are you downsampling? Decreasing read depth is going to increase noise and make things less accurate. What did you expect to see?

ADD REPLY • link 5.2 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

we are doing plasma cfDNA sequencing, the sequencing depth is very high (about 10000X-30000X), we want to determine which sequencing depth is enough to call CNV accurately, so we had sequenced some cfDNA samples with ~30000X, and then downsampling to 10000X、15000X、20000X to see which depth is ok to call cnv.

ADD REPLY • link 5.2 years ago by lffu_0032 ▴ 90

0

Entering edit mode

Which copy number program are you using? There are probably > 100 programs, the majority of which do not address all types of scenarios / biases. Edit: I see that German had already asked this. Can you please answer?

ADD REPLY • link 5.2 years ago by Kevin Blighe 89k

0

Entering edit mode

I do not use any thrid-party tools. i just use very simple method: use mean depth normalization to adjust the library size effect, ant calculate the normalized value (depth/mean_depth) for tumor and normal samples, sepeartely. for normal sample, i do not downsampling reads, so the normalized value for normal is fixed. the problem is that: the normalized value changes dramatically when downsampling different reads number, especially when the downsampling reads number is very small (relative to its full fq data)

ADD REPLY • link 5.2 years ago by lffu_0032 ▴ 90

0

Entering edit mode

That explains your finding, in that case.

ADD REPLY • link 5.2 years ago by Kevin Blighe 89k

0

Entering edit mode

en, i am not very understand, can you give some detail explan? thank you a lot.

ADD REPLY • link 5.2 years ago by lffu_0032 ▴ 90

0

Entering edit mode

Sorry, just to help understand better, can you perhaps show some example calculations?

ADD REPLY • link 5.2 years ago by Kevin Blighe 89k

0

Entering edit mode

Which tool do you use?

ADD REPLY • link 5.2 years ago by German.M.Demidov ★ 3.0k

0

Entering edit mode

i just use mean depth normalization method to adjust sequencing library size, normalized_depth = target_depth/mean(target_depth), i find for some genes, with smaller downsampling reads, the normalized_depth becames smaller, and for some genes, with smaller downsampling reads, the normalized_depth becames bigger.

ADD REPLY • link 5.2 years ago by lffu_0032 ▴ 90

0

Entering edit mode

I am not sure I understand, but may be the problem you describe is called "random sampling"

ADD REPLY • link 5.2 years ago by German.M.Demidov ★ 3.0k

0

Entering edit mode

yes, random sampling different number of reads from a full fq data to see which depth is enough to call cnv.

ADD REPLY • link 5.2 years ago by lffu_0032 ▴ 90

0

Entering edit mode

Which program are you using for the random sampling?

ADD REPLY • link 5.2 years ago by Kevin Blighe 89k

0

Entering edit mode

I use seqtk sample command

ADD REPLY • link 5.2 years ago by lffu_0032 ▴ 90

0

Entering edit mode

I want to known if anyone ever done the downsampling CNV analysis and encountered this problem?

ADD REPLY • link 5.2 years ago by lffu_0032 ▴ 90

2

Entering edit mode

I did this and I found the same and I was not surprised or confused - that's how statistics works and most of the CNV-detection tools in NGS are based on statistical methods

ADD REPLY • link 5.2 years ago by German.M.Demidov ★ 3.0k