I am trying to build a tab delimited table of count and read data from a deduplicated realigned bam file. I am looking at mitochondrial DNA (ver NC_012920.1) and want to know the read depth or count per base and the break up of that depth by read direction and the read base makeup. Something along the lines of:
Chr location Depth/Coverage Forward Reverse A T G C
MT position 1 300 165 135 298 1 0 1
...
...
...
Mt pos 16569 ...
I am also trying to do the same with a whole human genome and a bed file for specific ranges of targets. I am using
samtools depth -a mybam.bam > MTScan.txt
for the mitochondrial analysis and
samtools depth -b mybed.bed mybam.bam > HGScan.txt
for the human genome with bed file analysis and all i get are three columns for reference, position, and reads.
I believe samtools depth or bedcov are what I should use but i cannot seem to get all of the desired information out. Does anyone have any other suggested approaches I might be able to use? Thank you.
use the output of
I am getting an error that mpileup does not recognize --annotate. I will keep looking at mpileup.
Thank you!
sorry I was talking about the latest bcftools command
Thank you! Appreciate the clarification.
I don't have a full answer but I have a comment that might help. Samtools depth does not do what you are asking in a one and done. It reports chr pos depth/coverage.
I think the first thing you could do would be to split your bams into the forward bam, and reverse bam:
Forward bam:
Reverse bam:
Then running samtools depth on the forward and reverse, I'll add the -a flag to make sure positions align:
This will give you a format of:
Since the positions are aligned you probably only want the chr, pos, cov forward, and cov reverse ( I know this can be piped with the previous step):
The sed bit adds a header if you're interested. I do not know a tool for getting the base count at each position off of the top of my head, so I cannot add that. This is why I did not post as an answer, but I do believe this is a good start! Once you find a way to do the next step, you could run it on both the forward and reverse reads individually and get an output from that to see if numbers match up.
Dennis
Thank you for the help!