Entering edit mode
8.6 years ago
biohack92
▴
170
I've recently looked at methylation coverage (bisulfite seq data) in IGV, and there are obvious coverage outliers which I'm interpreting as mismappings of repetitive sequences. How do you identify these outliers from methylation calls/count data (I don't have BAM/SAM files) and remove them?
Thanks @Devon Ryan. I followed your advice and this is the plot I created. X-axis shows the # of reads/counts and Y-axis is the frequency of each count. Is there a way to determine which threshold is 'reasonable'?
You could make a qq-plot, but it's generally OK to just eye-ball things. From the distribution, it looks like a threshold around 200 would be reasonable.