You could also use the bedtools suite to create your coverage matrix with header information:
Calculate read coverage across the entire genome for multiple BAM files:
bedtools genomecov -bga -i sample1.bam -g genome.txt > sample1_genomecov.bedGraph
bedtools genomecov -bga -i sample2.bam -g genome.txt > sample2_genomecov.bedGraph
bedtools genomecov -bga -i sample3.bam -g genome.txt > sample3_genomecov.bedGraph
# Easier and faster to do this with GNU parallel if you have it installed...
parallel 'bedtools genomecov -bga -i {} -g genome.txt > {.}_genomecov.bedGraph' ::: sample*.bam
Combine read coverage files into a single file with header:
bedtools unionbedg -header -names sample1 sample2 sample3 -g genome.txt -empty -i sample1_genomecov.bedGraph sample2_genomecov.bedGraph sample3_genomecov.bedGraph > genomecov.txt
You could then write a quick script to extract the statistics you want from the coverage file:
#!/usr/bin/env python3
import collections
import csv
import numpy
with open('genomecov.txt'. 'r') as handle:
reader = csv.reader(handle, delimiter = '\t')
header = next(reader)
Row = collections.namedtuple('Row', header)
for line in reader:
row = Row(*line)
coverage = row[3:]
# Print the mean and standard deviation
print(row.chrom, row.start, row.end, numpy.mean(coverage), numpy.std(coverage))
There are probably more elegant solutions available, but this should work if you're in a squeeze...
Can you illustrate by an example? Btw the sequencing tag in Q is irrelevant!