Hello Researchers,
i have performed whole genome bisulfite sequencing (WGBS), after mapping the reads to the reference genome I got the methylation information of whole genome cytosines.
Please could anyone tell me how to distinguish methylated cytosine and unmethylathed cytosine? Because the information obtained after mapping the clean reads to the reference genome contained methylated reads (reads with C) and umnethylated reads (with T) at the same position.
looking forward for your earlier and positive response.
You've just answered your own question. At each C/T in the genome, you'll see a mix of methylated (preserved base) and unmethylated (changed base) reads. The methylation fraction for a site is the methylated count over the total reads at that base.
What you describe is not unusual. The first possible situation is that in the population of cells you sequenced, not every CpG at the same position has the same methylation status. The second situation is hemi-methylation, where a CpG is only methylated on one strand, but not at the other. You typically define a threshold when you consider a CpG methylated. We typically use 10 and 90%, so any CpG with more than 90% reads having C is called methylated, and any CpG with less than 10% reads having C (so more than 90% having T), is called unmethylated. The rest is undefined, at least for this kind of binary statement.
Edit: What is the scientific question behind your experiment. Is there any functional aspect linked to methylation in your setting?