Question

bwa mem low depth and fluctuating plot

0

Entering edit mode

6.9 years ago

ieie ▴ 20

Hi all, I am using bwa mem in this pipeline

bwa mem -t 12 -M -B 4 ref.fas xxx.r1.fastq.gz xxx.r1.fastq.gz > xxx.sam
samtools view -h -b -S -@ 11 xxx.sam > xxx.bam
samtools sort xxx.bam > xxx.sorted.bam
samtools depth xxx.sorted.bam > xxx.bed

Then with R, I read the bed file as dataframe and I plot the data. I get very low depth and a plot that is really fluctuating.

does anybody know if there is a way to improve the depth? thanks a lot for any help.

bwa mem samtools • 2.4k views

ADD COMMENT • link updated 6.9 years ago by sure ▴ 110 • written 6.9 years ago by ieie ▴ 20

1

Entering edit mode

Hello,

you can convert sam to bam and sort on the fly:

$ bwa mem -t 12 -M ref.fas xxx.r1.fastq.gz xxx.r1.fastq.gz|samtools sort -@ 11 -o xxx.sorted.bam -

What do you mean by "very low depth"?

fin swimmer

ADD REPLY • link 6.9 years ago by finswimmer 16k

0

Entering edit mode

when I check the depth at each position, how many times a base is covered as I understood, I see very low depth like from 20 to 300. I thought that the more the depth the more confidence I get in my data. Plus, the plot is really fluctuating while I would expect a more stable plot without many peaks.

ADD REPLY • link 6.9 years ago by ieie ▴ 20

0

Entering edit mode

Have you done any quality control on your reads?

ADD REPLY • link 6.9 years ago by Joe 22k

0

Entering edit mode

Have you considered showing us that plot?

ADD REPLY • link 6.9 years ago by WouterDeCoster 48k

0

Entering edit mode

depth plot

this is the plot that I get. thanks for any help!

ADD REPLY • link 6.9 years ago by ieie ▴ 20

score 0 · Answer 1 · 2018-09-24

0

Entering edit mode

6.9 years ago

sure ▴ 110

The plot looks fine to me. What read depth are you expecting? From the graph, it appears to have a read depth of 100+, I think that is pretty decent. On the question of peaks, there will always be regions in the genome which will over amplify in PCR amplification (hence sequenced multiple times) step of library preparation, other reason could be that they represent repetitive regions. Have you considered doing a PCR duplicate removal using PICARD, Some of these peaks may disappear.

ADD COMMENT • link 6.9 years ago by sure ▴ 110

0

Entering edit mode

Thanks a lot for your answer! what also worries me is that there many positions where the depth is 0 and I am not sure if I could fix this problem or not. I would not expect many positions with 0 depth.

ADD REPLY • link 6.9 years ago by ieie ▴ 20

0

Entering edit mode

That is to be expected. There are always regions of the genome which cannot be sequenced. Your data looks perfectly reasonable assuming it was e.g. illumina data.

ADD REPLY • link 6.9 years ago by Joe 22k