Question

Relation between copy number log ratio and copy number

0

Entering edit mode

3.1 years ago

jennyp0706 ▴ 10

Currently, I'm researching to get the minor, major copy number in tumor samples, so that I can use them in Pyclone. I have tried CNV caller such as PureCN, Sequenza and TitanCNA so far.

The caller calculates the most optimal ploidy, purity combination through likelihood and the result of one of my tumor sample was purity =0.7, ploidy = 5.254.

When I try to interpret the result, I keep on facing a problem. I believe that 'copy number log ratio' = 0 means that the normalized normal sample's depth and the normalized tumor sample's depth are equal and since ploidy of normal sample are diploid, the copy number that indicates 'copy number log ratio'=0 must be 2.

However, the result with the most optimal combination, the copy number log ratio' = 0 points toward integer other than 2. For instance 5 for the case I talked above.

enter image description here

Am I misunderstanding something? Should I have to pick an alternative solution or it is okay to go with the current optimal result?

I would really appreciate any replies because currently I am stuck.... and surely need some help

From Jenny

ratios copy cnv log number • 4.0k views

ADD COMMENT • link updated 3.1 years ago by markus.riester ▴ 550 • written 3.1 years ago by jennyp0706 ▴ 10

0

Entering edit mode

Just an hypothesis: could it be that the graph is plotting the log2ratio of the tumor ploidy in each position vs the average tumor ploidy? In that case you would expect 0 when the copy number is 5.24 (i.e. the tumor ploidy). Knowing the exact software and command that generated the plot could help.

ADD REPLY • link 3.1 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Thank you for the quick reply. I quickly looked up the explanation for the plot and it states " tumor vs. normal copy number log2-ratios for the maximum likelihood solution".

ADD REPLY • link 3.1 years ago by jennyp0706 ▴ 10

score 1 · Accepted Answer · 2021-10-28

1

Entering edit mode

3.1 years ago

markus.riester ▴ 550

This sample is on the noisy side. Noise often results in too high ploidy because complex solutions explain artifacts easily.

If you post the log file here or in a GitHub issue, I’ll take a look. Maybe there is something you can do to fix.

The log ratio of 2 copies depends on the ploidy. Samples with ploidy > 2 have more DNA to sequence than normals, but you don’t account for this when you normalize BAM files for coverage. That results in the shift you observed.

ADD COMMENT • link 3.1 years ago by markus.riester ▴ 550

0

Entering edit mode

Thank you for your reply. Here is the log file of PureCN for the sample above. Can I have a bit more explanation about "you don’t account for this when you normalize BAM files for coverage"???So, do you mean it is wrong to interprete using the yellow box?

Thank you in advance.

From Jen

enter image description here

ADD REPLY • link 3.1 years ago by jennyp0706 ▴ 10

0

Entering edit mode

Unfortunately I didn’t give the CNVkit workflow the attention it deserves recently. I focused on our internal normalization and full support for GATK4. So that’s what I recommend for now, at least initially.

Make sure to run Mutect with a 50bp padding (looks like you don’t). This should double the heterozygous SNPs. Also make sure to use the baits BED file, not an exon BED. The baits produce a more even coverage profile. Third, make sure that the segmentation function uses SNPs. Fourth, while not critical, generating a mapping bias file in PureCN can remove some more artifacts.

PureCN 2.0 was released yesterday. No dramatic changes, but lots of polishing.

The histogram in the PureCN output PDF shows you the expected log ratio for the integer copy numbers and the given purity and ploidy.

Have a look at the ABSOLUTE paper for details, it’s a great read.

ADD REPLY • link 3.1 years ago by markus.riester ▴ 550

0

Entering edit mode

I 'll surely rerun PureCN with the advices you gave. Thank you so much. One final question. Since PureCN gives multiple combination result, to choose the most optimal soultion even with a noisy data, which result should I pick??? Should I choose the result showing first in line or regardless of the order choose the result which "copy number log ratio = 0" indicates copy number 2(diploid) ?

It really helped to move one step forward. Thanks

ADD REPLY • link 3.1 years ago by jennyp0706 ▴ 10

0

Entering edit mode

That takes a little bit of experience and understanding of the algorithm. It can be tricky in cancer types with lots of sub-clonal alterations. See again the ABSOLUTE paper for a nice explanation. Have a look at the histogram first. Does it have a single major peak and few minor ones? Clearly diploid, and major peak should be 2 copies. If the maximum likelihood solution got that wrong, probably a setup issue unless lots of heterogeneity. Pick the correct one.

Multiple major peaks, more evenly in height? Clearly high ploidy solution. I look at segments with balanced (0.5 allelic fractions) first. These should have even copy numbers.

The cleaner the data, the easier for the algorithm. I don’t do any manual curations anymore with our high coverage data. Hopefully with my suggestions you can get the log ratio standard deviations closer to 0.2, which is what very clean WES data should look like. Old FFPE would be closer to what you currently have.

ADD REPLY • link 3.1 years ago by markus.riester ▴ 550

0

Entering edit mode

I was able to lower the log ratio standard deviation to 0.19 for the same sample above by following your devices which I believe it pretty good score. Thanks. Apart from the variant copy number( major(ML.C-ML.M.SEGMENT) and minor(ML.M.SEGMENT)), would there be a way I could get the segment copy number not variant???

ADD REPLY • link 3.1 years ago by jennyp0706 ▴ 10

0

Entering edit mode

Great, feel free to post the new file and I can double check that it looks good. The segment copy number is exactly what you write in the variant file, and the “C” column in the amplification and the LOH CSV file.

The variant copy number (called multiplicity in the ABSOLUTE paper) is ML.M. It’s confusing I admit, but too late change now I guess.

ADD REPLY • link 3.1 years ago by markus.riester ▴ 550