bwa mem low depth and fluctuating plot
1
0
Entering edit mode
6.2 years ago
ieie ▴ 10

Hi all, I am using bwa mem in this pipeline

bwa mem -t 12 -M -B 4 ref.fas xxx.r1.fastq.gz xxx.r1.fastq.gz > xxx.sam
samtools view -h -b -S -@ 11 xxx.sam > xxx.bam
samtools sort xxx.bam > xxx.sorted.bam
samtools depth xxx.sorted.bam > xxx.bed

Then with R, I read the bed file as dataframe and I plot the data. I get very low depth and a plot that is really fluctuating.

does anybody know if there is a way to improve the depth? thanks a lot for any help.

bwa mem samtools • 2.0k views
ADD COMMENT
1
Entering edit mode

Hello,

you can convert sam to bam and sort on the fly:

$ bwa mem -t 12 -M ref.fas xxx.r1.fastq.gz xxx.r1.fastq.gz|samtools sort -@ 11 -o xxx.sorted.bam -

What do you mean by "very low depth"?

fin swimmer

ADD REPLY
0
Entering edit mode

when I check the depth at each position, how many times a base is covered as I understood, I see very low depth like from 20 to 300. I thought that the more the depth the more confidence I get in my data. Plus, the plot is really fluctuating while I would expect a more stable plot without many peaks.

ADD REPLY
0
Entering edit mode

Have you done any quality control on your reads?

ADD REPLY
0
Entering edit mode

Have you considered showing us that plot?

ADD REPLY
0
Entering edit mode

depth plot

this is the plot that I get. thanks for any help!

ADD REPLY
0
Entering edit mode
6.2 years ago
sure ▴ 100

The plot looks fine to me. What read depth are you expecting? From the graph, it appears to have a read depth of 100+, I think that is pretty decent. On the question of peaks, there will always be regions in the genome which will over amplify in PCR amplification (hence sequenced multiple times) step of library preparation, other reason could be that they represent repetitive regions. Have you considered doing a PCR duplicate removal using PICARD, Some of these peaks may disappear.

ADD COMMENT
0
Entering edit mode

Thanks a lot for your answer! what also worries me is that there many positions where the depth is 0 and I am not sure if I could fix this problem or not. I would not expect many positions with 0 depth.

ADD REPLY
0
Entering edit mode

That is to be expected. There are always regions of the genome which cannot be sequenced. Your data looks perfectly reasonable assuming it was e.g. illumina data.

ADD REPLY

Login before adding your answer.

Traffic: 1023 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6