Hi, I have met some problem in analysis sc-WGBS dataset. The single cell WGBS experimental protocol was followed the nature protocol: Smallwood et.al. This is in Bovine, we used the genome UMD3.1.1
This is the fast-QC GC-content before TrimGalore
After the first 12bp were trimmed off in 5' end
The C% does not fall to 0, in some samples can be as high as 10%.
Mapping rate summary
We had a very low mapping rate 1.8% to 22.5%, and some sample has high methylation C in CHH 1.9% to 29.9%, which normally should be 1-2%.
Here is the code we used in Bismark:
bismark \
${genome_folder} \
--multicore 8 \
--non_directional \
--score_min L,0,-0.6 \
-I 0 -X 1000 \
--un \
-1 ${line}_1_val_1.fq.gz \
-2 ${line}_2_val_2.fq.gz
Final Alignment report
======================
Sequence pairs analysed in total: 41705524
Number of paired-end alignments with a unique best hit: 1240792
Mapping efficiency: 3.0%
Sequence pairs with no alignments under any condition: 40379911
Sequence pairs did not map uniquely: 84821
Sequence pairs which were discarded because genomic sequence could not be extracted: 169
Number of sequence pairs with unique best (first) alignment came from the bowtie output:
CT/GA/CT: 112057 ((converted) top strand)
GA/CT/CT: 122032 (complementary to (converted) top strand)
GA/CT/GA: 907980 (complementary to (converted) bottom strand)
CT/GA/GA: 98554 ((converted) bottom strand)
Final Cytosine Methylation Report
=================================
Total number of C's analysed: 52808430
Total methylated C's in CpG context: 2562392
Total methylated C's in CHG context: 2974558
Total methylated C's in CHH context: 13849485
Total methylated C's in Unknown context: 176
Total unmethylated C's in CpG context: 4258477
Total unmethylated C's in CHG context: 6975731
Total unmethylated C's in CHH context: 22187787
Total unmethylated C's in Unknown context: 1372
C methylated in CpG context: 37.6%
C methylated in CHG context: 29.9%
C methylated in CHH context: 38.4%
C methylated in unknown context (CN or CHN): 11.4%
We don't know if this is because of our library prep problem or something wrong with our data analysis. Any experience about this or good suggestions?
Thank you for your help!
Hi, Felix, Thank you very much for your reply. I tried single-mode to align in Bismark, the mapping rate does not increase too much, only 1-3%. I also tried L,0,-0.4, the mapping rate in one library dropped from 10.6% to 7.6%, but the non-CG methylation only dropped from 7% to 6% still is high. Do you recommend me to use the
filter_non_conversion
to remove those libraries with high methylation in non-CpG? I think bovine should like human that non-CG context is very low.