I have RRBS data and I am asked to calculate bisulfite conversion ratio. I have aligned the data using Bismark. How would I be able to calculate bisulfite conversion ratio using the aligned data.?
I have RRBS data and I am asked to calculate bisulfite conversion ratio. I have aligned the data using Bismark. How would I be able to calculate bisulfite conversion ratio using the aligned data.?
Normally you would use a PhiX spike-in to determine this. This step was likely done, but you may or may not have the resulting PhiX reads in your fastq file. This would be the first thing to check on. If you do have them, then align the reads to the PhiX genome and get the conversion efficiency from that (it'll be 1-methylation percent).
The absolute simplest method is to just ask whomever did the sequencing. You could also just map the reads to the PhiX sequence. You can find links to the sequence here: What Is The Refseq Id For Illumina Phix Control?
I mapped the fastq files to Illumina PhiX and got the follow result. How can I know conversion efficiency from these figures?
31403068 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 753492 + 0 mapped (2.40%:-nan%) 31403068 + 0 paired in sequencing 15701534 + 0 read1 15701534 + 0 read2 0 + 0 properly paired (0.00%:-nan%) 753492 + 0 with itself and mate mapped 0 + 0 singletons (0.00%:-nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
What tool did you use for the mapping? Many (bismark, bison, etc.) produce a summary at the end where the approximate methylation statistics are printed. Otherwise, you can use PileOMeth (it's in my github repo) to get per-CpG metrics and then awk to or R to get the average from that.
Sorry for my stupidity; I used BWA to align the reads. After reading your comment I realize that I should have used Bismark or similar.
From the lab people I got to know that they have used 10-15% PhiX control.
Now I have aligned the redas to PhiX using Bismark. Following is the alignment summary. How do I know the conversion rate? What is logic behind mapping reads to PhiX to know conversion ratio?
Final Cytosine Methylation Report
=================================
Total number of C's analysed: 6050559
Total methylated C's in CpG context: 1289702
Total methylated C's in CHG context: 1229962
Total methylated C's in CHH context: 3491603
Total methylated C's in Unknown context: 73
Total unmethylated C's in CpG context: 8174
Total unmethylated C's in CHG context: 6845
Total unmethylated C's in CHH context: 24273
Total unmethylated C's in Unknown context: 11
C methylated in CpG context: 99.4%
C methylated in CHG context: 99.4%
C methylated in CHH context: 99.3%
C methylated in Unknown context (CN or CHN): 86.9%
Hmm, so sometimes the PhiX that's spiked in is unmethylated, so you can then calculate things easily from that. Yours seems to be methylated (or the conversion failed completely, but that's unlikely), so this is apparently not useful then. In general, you want to spike in something that's not methylated (e.g. a PCR product or plasmid) and then use that to determine the conversion efficiency, since ideally it'll show ~0% methylation when mapped.
D'oh, I meant lambda above instead of PhiX, mea culpa! You can process against the lambda genome (I don't have a link handy for the fasta file) the same way as I had mentioned above for PhiX. In resulting "C methylated in CpG context:" line should then indicate 100% minus the conversion efficiency (e.g., 99.2% conversion will result in 0.8% C methylated in CpG context).
I just mapped to the phage lambda genome using fasta here http://www.ncbi.nlm.nih.gov/nuccore/J02459.
most of my samples have a spike-in lambda unmethylated DNA. I mapped them to lambda genome and calculated the efficiency. I do not have spike-in for one sample. I have to check the conversion efficiency for the red unmethylated C (see below) introduced at the 3’ when end-repair was done. for my bismark pipeline, trim_galore will remove this unmehtylated C if there is adaptor contamination. I read it here http://www.bioinformatics.babraham.ac.uk/projects/bismark/RRBS_Guide.pdf
[url=http://postimg.org/image/mrch0rpfv/][img]http://s33.postimg.org/mrch0rpfv/Screenshot_2016_05_23_10_14_20.jpg[/img][/url]
How do I calculate it? I need to take the fastqs and trim-off the adaptors, but not the last 2 bases at 3', map with bismark, and check how many Ts are at the end of the each read? Is there any script to do so?
Thanks, Ming
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Do you sequence unmethylated phage DNA along with your sample? In our lab, that's what we use to calculate the BS conversion rate since all methylation values within the sample are unknown.