ctDNA fraction calculation
2
1
Entering edit mode
7.2 years ago
Richard ▴ 10

Hi, all,

Recently I am doing ctDNA copy number variation analysis(whole genome sequencing data). The problem I encountered is that I don't know how to calculate the fraction of ctDNA in plasma. I looked into some papers but it didn't make sense. So I am wondering if someone can help me with this problem. Much appreciated.

Richard.

next-gen • 4.3k views
ADD COMMENT
0
Entering edit mode

I did my PhD on circulating free DNA (cfDNA) in breast cancer.

Assuming that you know what your somatic variant genotypes are, you could just calculate the ratio of each somatic variant allele to each reference allele variant, and then check the profile of frequencies that you find. My own data shows that somatic variants can be detected in blood plasma down to 1%.

If the patient has a viable tumour and is shedding cellular debris into the bloodstream, you''ll be detecting both healthy cfDNA in the blood, and also ctDNA from various tumour clones. Thus, the read depth behind each somatic variant will be more or less proportional to the reads mapping to the reference allele from the same biopsy.

There are other more advanced and specific ways of determining exact ctDNA fractions but you only have NGS DNA-seq data it seems.

ADD REPLY
0
Entering edit mode

Thanks so much for your quick reply, Kevin.

What you have come up with sounds reasonable. But can we calculate ctDNA fraction directly based on cnv analysis? In fact, I have seen a group done this: Materials and Methods. In their method part, they give the equation. However, I don't understand that very well. Can you explain it to me? Thanks.

Richard

ADD REPLY
0
Entering edit mode

Hi Richard,

Interesting manuscript - they identified chr1pq and chr8pq alterations in both tumour and matched plasma. Both of these large-scale alterations are common across cancers.

The methods that you need appear to start with the paragraph:

CAZA for CNA. The entire human genome was divided into 100-kb bins. The GC-corrected read count was ...

...which then leads into the important paragraph:

The fractional concentration of tumor-derived DNA in the plasma (F) can be calculated as follows:

The key part appears to be just initially dividing your data into 100 kbp bins and counting GC-adjusted counts in these. They give a citation (51) where this was previously done. One program that I know that does this for you given an aligned BAM file is bamCoverage (http://deeptools.readthedocs.io/en/latest/content/tools/bamCoverage.html), and this allows you to adjust for GC bias. After that, it appears to just be about transforming your binned reads into the Z-scale and then following the formulae that they use o calculate tumour fraction.

ADD REPLY
0
Entering edit mode

Hi Kevin,

Thanks for the detailed explanation about the analysis workflow. Can you explain further about the formula they presented to calculate the fraction of ctDNA? I am confused about why the ctDNA fraction can be calculated like this... This question may be a piece of cake for you, but I don't think it out.

Richard

ADD REPLY
0
Entering edit mode

Hi Richard,

Well, you have to consider nature of the experiment in order to better interpret the formulae.

When one takes a blood biopsy from a patient with a viable tumour and then extracts DNA from the blood plasma, you're going to be sequencing both tumour-derived DNA and also DNA from normal 'healthy' cells from the same patient. Unfortunately, we cannot readily distinguish which is which (although new research is showing that soon we may be able to via the study of epigenetic modifications on the cfDNA). So, one typically has to understand what the normal 'healthy' amount of cfDNA is in a matched set of healthy controls and then use this information as a sort of 'background correction' to just leave the tumour-derived fraction. This is why the authors literally extract, from each patient, the fraction of DNA in the 32 healthy subjects in their formulae.

As far as I can tell from what they've written, P.test refers to, for each patient, the total of all binned mapped reads for the chromosome arm of interest; whereas, P.normal refers to the mean of the totals of all binned mapped reads in the 32 healthy controls. SD.normal is the same but is the standard deviation. Therefore, you should be able to calculate the Z-score per chromosome arm per patient. Z-scores >3 or -<3 are determined by the authors as gains or losses, respectively (Z-score=3 means 3 standard deviations from the mean, i.e. 3 sigma).

For the second formula, you have most of the information already. I'm not 100% sure to what ΔN refers, but I can only assume that it's either related to the Z-score calculated from the previous formula or it's a copy number value that they already knew from analysing the tumour samples.

As to why it's important to take copy number alterations into account for determining ctDNA fractions, consider the following: the number of reads mapping to a genomic locus is dependent on the copy number of the locus in question. Provided there are enough reagents in the sequence run and enough sequence cycles, if a region is duplicated, then one would expect roughly double mapped reads. In HER2 amplification in breast cancer, reads really pile up big time over the HER2 locus if the amplification is present. The logic is somewhat similar to homologous sequences that can 'rob' reads from our particular gene of interest. Similarly, if a locus is deleted, one would expect roughly half reads.

All of this is important because we calculate copy number specifically from the number of reads that map. If we ignored the fact that genomic regions may undergo large scale alterations, then we'd be under- or over-estimating the fraction of ctDNA.

Maybe that's a bit more clear (or maybe it's not!).

ADD REPLY
0
Entering edit mode

Hi, Kevin

I do completely understand what you say and I think I have worked out. ΔN refers to copy number change in that paper :)

Thanks again.

Richard

ADD REPLY
0
Entering edit mode
4.9 years ago
amjad ▴ 100

If you are interested in testing whether there is any fraction of ctDNA in plasma or not rather than quantifying the amount, you may be interested in checking the ctDNAtools R package:

https://github.com/alkodsi/ctDNAtools

ADD COMMENT
0
Entering edit mode
4.0 years ago

You may want to take a look at the ichorCNA tool. It's specifically developed to determine the tumor fractions from ultra low pass whole genome sequencing, mostly based on copy number alterations.

https://github.com/broadinstitute/ichorCNA

I hope this helps. All the best,

Felipe

ADD COMMENT

Login before adding your answer.

Traffic: 2150 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6