I genotyped 100 samples for CNVs with Taqman Assays and found 6 samples with a heterozygous duplication. I used two independent assays which target at the beginning and the end of the CNVs, according DGV. So I’m "quite" sure that these duplications are really Gains.
We have also SNP data (LRR and BAFvalues) from a genome wide SNP array in this area. But exactly there are only 17 SNPs. Not that much but sufficient for PennCNV to call CNVs. But only in 4 of the 6 samples we were able to reproduce the results from qPCR.
Therefore I plotted BAF and LRR values against the SNP- position and saw several things:
The 4 samples in which both methods (qPCR & PennCNV) called a Gain have many SNPs with heterozygous Alleles (8 to 11 of 17). Thus BAF values cluster around 4 points. (AAA,AAB,BBA,BBB) and the LRR values increase.
Nearly all SNPs in the 2 samples in which the qPCR analysis showed also a duplication but PennCNV not, are homozygous. (15-16 of 17). This is the reason why all SNPs have BAF values around 0 or1. (AAA or BBB) The surprising effect is that the LRR is not increasing. But actually it should increase in case of duplications.
Now my question: Could it be that the effect of non-increasing LRR values is due to the homozygous SNPs? The data was exported from Beadstudio. So I would imagine that the internal LRR calculation [LRR = log2(Robserved/Rexpected)] failed due to threshold mistakes or interpolation failures. (more than 60% of the 100 samples carry mainly homozygous SNPs)
(Rexpected is computed from linear interpolation of canonical genotype clusters (Peiffer et al. 2006))
Thanks
Sorry I'm a complete rookie in this field. But if I understand you right ("entire array for cluster generation") I think I've done that. I used BAF and LRR values which were exported from beadstudio as final results. Not only the 17SNPs but the complete array.