I would like to know what methods are mostly used to extrapolate a LOH event from variant allele frequencies on SNPs. I have targeted sequencing data on FFPE tumors so that's the only approach I can possibly utilize. My current problems are
a) Finding a script (R?) that takes into account tumor cell content (%)
b) Normalize my data (quantile?) as I have quite a few frequencies that don't aggregate in 0, 0.5 and 1 VAF.
Note that my samples include a normal site and a primary tumor for each case. Initially I tried to use Bioconductor's quantsmooth but couldn't figure out the right way to format my input.
hi,
I haven't understand some bits of your question, like what do you mean by normalizing allele freq., but I will try to say what I understand about inferring LOH.
First off, some of the variant callers like VarScan, give you LOH calls as well, along with somatics. So if you have paired samples, then its quite ez.
The part which I didn't understood: "...few freq. dont aggregate in 0, 0.5 & 1 VAF". As far as I know, unless you have 100% pure tumor sample AND no clonal heterogeneity AND no copy-number variations going on.., VAFs would not fit into those 3 classes, rather would be a continuous spectrum ranging from 0 to 100.
If not using softwares like VarScan, another way of looking into LOH would be to devise custom scripts to look into VCF files from single-sample variation calling. Most VCFs have allele depths (ref. & var.) quoted for each variant. Looking for pos. with variant call in the normal sample but no call (all reads supporting the ref. allele instead) in the tumor sample is one simplistic way.
Thanks for the reply. My problem is that a considerable number of recorded VAFs fall between 0.2-0.3 and 0.70-0.85 and therefore this returns a quite unfamiliar scatter-plot, that's why I thought about trying quantile normalization (only for better visualization).
I haven't heard of normalizing VAFs (pardon me if I have missed something very obvious). But they are supposed to be a continuous spread. One consideration is whether you have performed whole exome/ genome or targeted gene panel seq. instead. If the number of varaints are few to start with then the spread would not appear smooth. I am attaching a scatterplot comparing VAFs of two related tum. samples.
the pink dot was a hallmark mut. and hence highlighted. The grey blobs are private mut.
Thanks for the reply. My problem is that a considerable number of recorded VAFs fall between 0.2-0.3 and 0.70-0.85 and therefore this returns a quite unfamiliar scatter-plot, that's why I thought about trying quantile normalization (only for better visualization).
I haven't heard of normalizing VAFs (pardon me if I have missed something very obvious). But they are supposed to be a continuous spread. One consideration is whether you have performed whole exome/ genome or targeted gene panel seq. instead. If the number of varaints are few to start with then the spread would not appear smooth. I am attaching a scatterplot comparing VAFs of two related tum. samples. the pink dot was a hallmark mut. and hence highlighted. The grey blobs are private mut.