Correct, the 'logR', more commonly known as the Log R Ratio (LRR) is just the log (base 2) (log2) of the probe intensity in, e.g., tumour, divided by intensity in matched normal - it is a crude measure for copy number. When this log2 ratio = 0, there is no difference between tumour and normal.
The definition of B-allele frequency (BAF) is never clear; however, it can be generally regarded as the frequency of the allele under study, which may the minor allele in a population study.
There are different points at which the software will struggle to correctly compute the BAF. If your DNA sample is poor quality, then everything will be difficult to calculate! If we plot the genotype of every SNP for a single sample of good quality, we would see a figure like this:
Here, the arms represent (for A and B alleles):
- vertical arm: BB (homozygouse B)
- diagonal arm: AB (heterozygous)
- horizontal arm: AA (homozygous A)
This sample has mostly well-defined genotype calls, as judged by the well proportioned / orthogonal arms. The 'fuzzy bits' between the arms represent genotype calls that are on the borderline - these genotype calls will not be accurate, and neither, therefore, will the BAFs for these.
-----------------------------
Conversely, look at a similar plot for this very poor quality DNA sample:
That data would have to be thrown into the trash can.
-----------------------------------
Things that can affect the calculation of the BAF:
- allelic cross talk: when a probe for the A allele binds to the B
allele sequence, and vice-versa
- allelic imbalance: this occurs, when, e.g., homozygous A (AA) signal strengths are lower or higher than homozygous B (BB), and I assume is down to differences in binding affinities between, e.g., GC and AT genotypes
Both of these sources of bias are usually corrected in any processing pipeline.
Take a read of my other answer: A: Genotyping, genotype calling or SNP calling?
Kevin
Thank you for your answer!
HI Kevin,
I used: http://penncnv.openbioinformatics.org/en/latest/user-guide/test/ to generate BAF and LRR after doing the whole workflow ending with this command:
So now I have values of BAF and LRR for each sample:
For one sample a file like this:
How would I make a plot like you show above? Can you please share the code? Also what is on your x and y axis?
HI Kevin,
I used: http://penncnv.openbioinformatics.org/en/latest/user-guide/test/ to generate BAF and LRR after doing the whole workflow ending with this command:
So now I have values of BAF and LRR for each sample:
For one sample a file like this:
How would I make a plot like you show above? Can you please share the code? Also what is on your x and y axis?