I have a paired-end data from "sureselect methyl-seq target enrichment system" which is a kind of Bisulfite sequencing data produced in a similar manner as RRBS as far as I understand.
However, after aligning with Bismark with the default Bowtie2, I found that there is a really large bias for the read mapped to the positive strand and to the negative strand, just as shown below:
Number of sequence pairs with unique best (first) alignment came from the bowtie output:
CT/GA/CT: 1073089 ((converted) top strand)
GA/CT/CT: 6013 (complementary to (converted) top strand)
GA/CT/GA: 94492 (complementary to (converted) bottom strand)
CT/GA/GA: 24124100 ((converted) bottom strand)
I know maybe I should use the default directional option after finding the non-directional option will lead to something like this but I am showing this to illustrate that there is really a big discrepancy between the alignment to the top strand and the alignment to the bottom strand.
And after doing methylation extraction and having the genome-wide cytosine report, I get something like the following:
chr12 120250754 + 1 1 CG CGC
chr12 120250755 - 103 9 CG CGC
chr12 120250870 + 0 0 CG CGG
chr12 120250871 - 59 2 CG CGT
chr12 120250913 + 0 0 CG CGA
chr12 120250914 - 27 0 CG CGA
As far as my understanding, there should not be that large bias in the number of reads in positive strand and in negative strand. I have trimmed my data before running alignment and I used almost default setting with Bismark. I would like to know having this kind of result is because of the data quality or because of my way of handling this kind of data. I would also like to know that whether there is a way to handle this problem if it happened because of data quality.
Thank you!
I'm working with the same kit and get the same results using trim_galore with --paired and bismark with bowtie2 and -N 1 (directional is the default). No idea why...
Seems that we are having the same problem .... Not sure whether it is because of data quality ... Do you also have a larger number of aligned read in the bottom strand?