Question

Using Samtools depth to get number of forward and reverse reads per base from a .bam file

1

Entering edit mode

6.8 years ago

jspainhour3 ▴ 10

I am trying to build a tab delimited table of count and read data from a deduplicated realigned bam file. I am looking at mitochondrial DNA (ver NC_012920.1) and want to know the read depth or count per base and the break up of that depth by read direction and the read base makeup. Something along the lines of:

Chr location Depth/Coverage Forward Reverse A T G C

MT position 1 300 165 135 298 1 0 1

...

Mt pos 16569 ...

I am also trying to do the same with a whole human genome and a bed file for specific ranges of targets. I am using

samtools depth -a mybam.bam > MTScan.txt

for the mitochondrial analysis and

samtools depth -b mybed.bed mybam.bam > HGScan.txt

for the human genome with bed file analysis and all i get are three columns for reference, position, and reads.

I believe samtools depth or bedcov are what I should use but i cannot seem to get all of the desired information out. Does anyone have any other suggested approaches I might be able to use? Thank you.

next-gen sequencing • 4.8k views

ADD COMMENT • link 6.8 years ago by jspainhour3 ▴ 10

0

Entering edit mode

use the output of

 bcftools mpileup --annotate 'FORMAT/DP4' --regions-file mybed.bed --no-reference in.bam'

ADD REPLY • link 6.8 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

I am getting an error that mpileup does not recognize --annotate. I will keep looking at mpileup.

Thank you!

ADD REPLY • link 6.8 years ago by jspainhour3 ▴ 10

0

Entering edit mode

sorry I was talking about the latest bcftools command

ADD REPLY • link 6.8 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thank you! Appreciate the clarification.

ADD REPLY • link 6.8 years ago by jspainhour3 ▴ 10

0

Entering edit mode

I don't have a full answer but I have a comment that might help. Samtools depth does not do what you are asking in a one and done. It reports chr pos depth/coverage.

I think the first thing you could do would be to split your bams into the forward bam, and reverse bam:

Forward bam:

samtools view -F input.bam > forward.bam

Reverse bam:

samtools view -f input.bam > reverse.bam

Then running samtools depth on the forward and reverse, I'll add the -a flag to make sure positions align:

samtools depth -a forward.bam > forwardDepth.txt
samtools depth -a reverse.bam > reverseDepth.txt
paste forwardDepth.txt reverseDepth.txt > forwardAndReverse.txt

This will give you a format of:

chr    pos1    cov    chr    pos1    cov

Since the positions are aligned you probably only want the chr, pos, cov forward, and cov reverse ( I know this can be piped with the previous step):

awk '{print $1, $2, $3, $6}' forwardAndReverse.txt > forRev_cut.txt
sed -i '1s/^/Chr    Position    ForCov    RevCov' forRev_cut.txt

The sed bit adds a header if you're interested. I do not know a tool for getting the base count at each position off of the top of my head, so I cannot add that. This is why I did not post as an answer, but I do believe this is a good start! Once you find a way to do the next step, you could run it on both the forward and reverse reads individually and get an output from that to see if numbers match up.

Dennis

ADD REPLY • link 6.8 years ago by drkennetz ▴ 560

0

Entering edit mode

Thank you for the help!

ADD REPLY • link 6.8 years ago by jspainhour3 ▴ 10