Hi!
You can do this with samtools depth
easily.
samtools depth
can work with multiple inputs and it generates such a file with the below command:
Command (please see official docs for arguments, you will probably need to provide forward and reverse reads from BAMs separately)
samtools depth -r 1 -m 100 -Q 20 -q 10 -a bam1 bam2 bam3
Output
1 55475000 38 49 58
1 55475001 37 50 57
1 55475002 36 51 55
1 55475003 38 52 55
1 55475004 36 51 56
1 55475005 35 51 55
1 55475006 35 51 54
1 55475007 35 51 55
1 55475008 35 54 51
1 55475009 35 53 56
1 55475010 34 53 53
1 55475011 35 53 56
1 55475012 36 53 55
1 55475013 35 54 56
1 55475014 36 55 59
1 55475015 35 55 58
1 55475016 34 49 54
1 55475017 34 53 55
1 55475018 35 54 54
1 55475019 34 50 54
1 55475020 35 54 56
1 55475021 36 54 59
1 55475022 35 55 60
1 55475023 34 55 59
1 55475024 35 51 58
1 55475025 34 51 59
1 55475026 34 50 60
1 55475027 33 49 57
1 55475028 33 50 58
1 55475029 33 50 58
1 55475030 32 51 57
1 55475031 31 51 59
1 55475032 29 50 59
1 55475033 28 49 55
1 55475034 28 49 55
Finally, use a Python script for example to read every position collect per sample metrics and calculate mean, median, etc. Following is a sample code for my previous analysis (for a similar purpose, not exactly the same).
import sys
import numpy as np
print('\t'.join([
'#chrom', 'pos', 'mean', 'median',
'1', '5', '10', '15', '20', '25',
'30', '50', '100']))
with open(sys.argv[1]) as rows:
for row in rows:
row = row.strip().split()
print('\t'.join(row[0:2]), end='\t')
covs = np.array(list(map(int, row[2:])))
two_decimals = '%.2f'
four_decimals = '%.4f'
above_1 = np.size(np.where(covs >= 1)) / np.size(covs)
above_5 = np.size(np.where(covs >= 5)) / np.size(covs)
above_10 = np.size(np.where(covs >= 10)) / np.size(covs)
above_15 = np.size(np.where(covs >= 15)) / np.size(covs)
above_20 = np.size(np.where(covs >= 20)) / np.size(covs)
above_25 = np.size(np.where(covs >= 25)) / np.size(covs)
above_30 = np.size(np.where(covs >= 30)) / np.size(covs)
above_50 = np.size(np.where(covs >= 50)) / np.size(covs)
above_100 = np.size(np.where(covs >= 100)) / np.size(covs)
data = [
two_decimals % np.mean(covs),
two_decimals % np.median(covs),
four_decimals % above_1,
four_decimals % above_5,
four_decimals % above_10,
four_decimals % above_15,
four_decimals % above_20,
four_decimals % above_25,
four_decimals % above_30,
four_decimals % above_50,
four_decimals % above_100
]
print('\t'.join(data), end='\n')