Entering edit mode
9.4 years ago
igor
13k
The first three bases of RRBS reads are CGG/TGG. This is a a known phenomenon and there is a lot of documentation about this. However, I am not sure what to do with those bases. Should they be trimmed or not?
The M-bias plot generated by Bismark looks skewed at the beginning, which concerns me.
I generated a Bismark M-bias plot and it looks skewed. However, I am not sure if I should be concerned. Couldn't that be real, since we are cutting at Mspl sites, so we are always starting with a CpG site. The rest of the read has CpG sites randomly distributed. I hope I am not too confused here.
It should still be relatively level. There's no a priori reason that CCGG sites should have a different methylation rate than other CpG sites.
I expect that it is real and would not trim. In our lab, we use BSmap rather than bismarck, but we leave those bases on. Any base repair errors are likely limited to the adapter sequence (if you are using something like Illumina).
Base-repair errors won't affect adapter sequences, which should be aggressively removed in any case. Even in WGBS datasets, I often see weird jumps in methylation for the first base or two...there seems to just be a bias there.