Question

BS-seq: Using single end vs paired end in Bismark when Mapping efficiency is low

1

Entering edit mode

8.3 years ago

noodle ▴ 70

Hi,

Ive read through several of the posts on seqanswers and biostars to try and figure out whats going on but perhaps someone may spot something that is not obvious to me!

I have bisulfite treated patient samples (primary breast cancer to metastasis in bone and primary breast cancer to metastasis in brain ) with sequencing done on the HiSeq2500.

I used

    trim_galore --paired --retain_unpaired --trim1

to trim the paired end reads in each read group.

Then I aligned to the bisulfite converted genome in paired end mode first using bismark and allowing up to one mismatch

bismark --quiet --bowtie2 --multicore 12 -n 1 /bismark_aln/genomes/hg19_bs/ -1 bone.lane7.R1_val_1.fq.gz -2 bone.lane7.R2_val_2.fq.gz

bismark output

As you can see in Paired End Mode I am getting 28.1%, 24.6% alignment for the bone sample and then 61.0% for the brain sample.

To see if the mapping efficiency increases aligning only Read 1 in the bone_Read Group 1 I aligned again in single end mode.

bismark --quiet --bowtie2 -n 1/bismark_aln/genomes/hg19_bs/ bone.lane7.R1_val_1.fq.gz &

The mapping efficiency increased to 66.0%.

I then decided to see if by adjusting the scoring function for mismatches based on sequence length could i get a better mapping efficiency in the paired end mode using an subset of 100,000 reads

bismark --quiet --bowtie2 --score_min L,0,-0.6 -u 100000 /bismark_aln/genomes/hg19_bs/ -1 bone_RG1/bone.lane7.R1_val_1.fq.gz -2 bone_RG1/bone.lane7.R2_val_2.fq.gz &

Mapping efficiency only increased from 28.1% to 38.7%

Thinking now that read 2's are problematic i decided to see what the mapping efficiency would be like if i aligned only Read 2 in single end mode using the --pbat flag and a subset of 100,000 reads

 bismark --quiet --bowtie2 --score_min L,0,-0.6 -u 100000 /bismark_aln/genomes/hg19_bs/ --pbat bone_RG1/bone.lane7.R2_val_2.fq.gz

Mapping efficiency was 57.0%.

So I know in order to get the most out of the data and to increase the mapping efficiency i should ideally be aligning in single end mode using read 1 and increasing the scoring function in order to account for indels but is it correct to do this?

Can i continue to the methylation call extraction using only read 1 alignment but use paired end alignment for other samples?

Or can you suggest anything else i can do to increase mapping efficiency in paired end mode?

Thanks!

alignment bisulfite bismark methylation • 8.0k views

ADD COMMENT • link updated 7.8 years ago by Biostar 20 • written 8.3 years ago by noodle ▴ 70

1

Entering edit mode

Maybe try bwa-meth and see if that improves the paired-end alignment. You can continue with only read 1, but it'd be preferable to use both mates if you can fix the alignment issue.

BTW, you don't need to use --trim1 with bowtie2.

ADD REPLY • link 8.3 years ago by Devon Ryan 104k

0

Entering edit mode

We have a similar issue with data from a non-model organism, though the range of % reads aligned is from ~25-60%. I agree re: bwa-meth (we saw much better alignment results), though it's also worth pointing out this recent post from the Bismark developer:

https://sequencing.qcfail.com/articles/soft-clipping-of-reads-may-add-potentially-unwanted-alignments-to-repetitive-regions/

Bismark is a great aligner but is very strict re: concordant paired reads with end-to-end mapping and will not report anything else. I think this strictness is probably fine if you have very high-quality samples from a well-analyzed model organism, but for our group this isn't the norm (low conc samples from non-model organisms are common). I personally would rather have all results reported (good, ambiguous, unmapped) so I can assess problems with the samples, followed with some basic sane default cutoffs for additional QC and methylation calls.

ADD REPLY • link 8.3 years ago by Chris Fields ★ 2.2k

1

Entering edit mode

Yeah, I have a different take than Felix. I set the defaults in PileOMeth to be reasonable and avoid the situation he's talking about with repeat regions. Since the numbers I used were from bison, and bismark just ported my C code to perl, they could just use the same defaults. Methylation extraction is a lot faster than alignment, so I too would rather filter things after the fact.

ADD REPLY • link 8.3 years ago by Devon Ryan 104k