Cutadapt for paired end RRBS data
1
0
Entering edit mode
6.6 years ago
anikb ▴ 40

Hi all,

I have just started working with RRBS data, and I am having issue understanding the data and need to understand how to trim adapter sequences. This is probably a very naive question, but any help would be appreciated. I am trying to trim paired-end RRBS data using Cutadapt. (I am aware of Trim_galore, but I am using Cutadapt mainly because Cutadapt is already installed on the cluster that I am working on, and since Trim galore uses Cutadapt, I should get the same results).

If I want to trim the Illumina adapter sequence(AGATCGGAGAGC) from the 3' end of my Paired end data using cuadapt, at first I add 2 Ns to the Adapter (NNAGATCGGAGAGC) to remove bases added during end repair. So, my understanding is we are supposed to remove the additional bases and adapter only from 3'end. What happens to the adapters or additional bases at the 5'end? This paper, doi: 10.1186/s12864-016-2494-8, briefly mentions:

Removal of additional bases for pair-end sequencing can be tricky as it can affect subsequent RRBS read alignment. For example, removing two additional bases from the beginning of the read 2 (complementary reads to the original forward and reverse strands) would remove CGG tag that is used to search for indexed CCGG motif in the reference genome causing the reads to remain unaligned in RRBSMAP.

For my paired end reads, how should I trim the adapters using cutadapt? Do I not provide a reverse complement for my Read 2 at all ? or do it like below? Please advise.

cutadapt -a NNAGATCGGAGAGC -A GCTCTCCGATCTNN -o outputR1 -p outputR2 inputR1.fastq inputR2.fastq

Thanks!

RRBS Trimming Cutadapt • 4.1k views
ADD COMMENT
2
Entering edit mode
6.6 years ago

My suggestion is to not bother removing the repaired bases and to instead simply ignore them during methylation extraction. The procedure is then essentially:

  1. cutadapt -a AGATCGGAGAGC ... (N.B., cutadapt can use multiple threads!)
  2. Align (I usually use bwa-meth from Brent Pedersen)
  3. Look at the methylation bias (e.g., MethylDackel mbias ...)
  4. Extract ignoring bases showing bias (e.g., MethylDackel extract --OT 2,98,2,98 ...)

Then you're maximizing the quality of the alignments but also not including problematic bases (or having your cake and eating it too, if you prefer).

ADD COMMENT
0
Entering edit mode
  1. Hmm, ok. Thank you for your answer! So I will not mess with the repaired bases at this point then and take care of it after creating the M-bias plot. I might have to use Bismark, since that's what is available on our cluster, and it has both the features that you mentioned above(3,4) . Is bwa-meth better for alignment than Bismark, in your experience?

  2. Also, I just want to explicitly mention what I understood for the trimming part and confirm it. With Cutadapt, I should provide the reverse complement of adapter for the R2 file as well, right?

    cutadapt -a AGATCGGAGAGC -A GCTCTCCGATCT -o oR1 -p oR2 inR1.fq inR2.fq

ADD REPLY
1
Entering edit mode
  1. I've typically gotten better and faster results with bwa-meth, yes.
  2. You can use cutadapt -a AGATCGGAAGAGC -A AGATCGGAAGAGC, you don't need to reverse complement the adapter for R2 since the read will have the opposite orientation to begin with.
ADD REPLY
0
Entering edit mode

Thanks a bunch, that really helped!

ADD REPLY
0
Entering edit mode

Hi Devon, Just a follow up on the last question. I trimmed the files and ran bismark with default options as below:

  1. Genome preparation: bismark_genome_preparation --verbose hg38_gencode
  2. Alignment: bismark reference_genome -1 Read_1.fq -2 Read_2.fq

The M-bias plots look like these (below). In the M-bias plot examples (Google), the CpG plot line looks like a straight line across the length of the read except at ends. Mine are very uneven across the read length. Would you know why that is?

M-bias plot for Read 1

M-bias plot for Read 2

ADD REPLY
0
Entering edit mode

Yikes! What was your alignment rate? I usually only see M-bias plots like this with targeted amplicon-seq combined with bisulfite treatment (i.e., really really reduced representation BS-seq). Do you have a high degree of duplication? That's the most likely cause of something like this. If that's the case, then mark duplicates and recreate the M-bias plots while ignoring marked duplicates (I presume there's an option for that in bismark, but if not just use MethylDackel).

ADD REPLY

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6