Ok I a fairly new to both SNV calling and chip arrays. Most of my experience is with RNA-seq, alternative splicing etc.
Now, I have some data from Illumina's Human CytoSNP12 bead array that I want to analyze. I used GenomeStudio and I got all the tables, and most of what I see makes sense. My only issue is that, if you analyze cancer patient's samples and you try to find rare SNPs in their samples, what is the confidence that let's say BAF of 1% is actually true and not background or other weird things that may interfere with the technology and software?
Is there a minimum BAF percentage that we can use with confidence? 5%? 10%? I know that in NGS depending on the depth of sequencing and if you use molecular barcodes to distinguish true copies vs PCR duplicates you can go to 5% and call it with confidence.
Any body knows what people have been using for Illumina's arrays and GenomeStudio software? I'm getting some SNVs in the 1-5% BAF but I don't know if I should use them or not.
Thank you
To me it's unclear what you're trying to achieve.
Hi Jan,
Thank you for helping me out here and I apologize for not being clear.
I did not design the experiment but a lab that designed it asked me to analyse the data. They have 12 patients (cancer patients) and they used this chip as a genotyping tool. Their goal is to find, for each patient, if there are low allele frequency SNPs (aka cancer cell SNPs) in his/her sample, and use this information to do some protein prediction to see if that mutation could lead to a deleterious effect on the protein isoforms that have the SNPs. That is per patient. They look each patient independently...
Not the best approach but that's what the lab did.
So knowing that this is not the best approach, I'm just trying to figure out what should I use as my cut-off for B allele frequencies. For example, is 0.01% BAF most likely bogus/meeting the array's detection limits? or can I use it to call a very rare allele frequency from few cancer cells in that patients sample. Something like that.