Hi,
I have a bam file and I want to subsample this bam file 10 times with 1x, 2x, 5x coverage each.
Can anyone help me solving this?
I tried to use samtools and sambamba but nothing could make accurate results.
Thank you!
Hi,
I have a bam file and I want to subsample this bam file 10 times with 1x, 2x, 5x coverage each.
Can anyone help me solving this?
I tried to use samtools and sambamba but nothing could make accurate results.
Thank you!
Then you should combine samtools with a for loop in a bash script. For instance to subsample multiple files multiple times at multiple depth, you could use this code (untested).
#!/bin/bash
# down-sample multiple bam files (given as input) 100 times for K fractions (1, 5, 10, 25 % in this example) of the original bam files
# This script requires samtools installed and in your PATH.
set -e
if [ $# -eq 0 ]; then
echo "Usage : down_sample_bam.sh bam_file_1 [bam_file_2, ...]"
else
for i in $*
do
echo ""; echo "bam file : ${i}"; echo "" ; echo "downsampling reads..."
for k in 1 5 10 25
do
echo "${k} %"
for j in {0..99}
do
samtools view -b -s ${j}.${k} ${i} > ${i}_down_${k}_${j}.bam
done
done
done
echo "done !"
fi
exit 1
PS: note that samtools view -s breaks read pairs. If you have paired-end reads and want to keep your mate pairs together, you might use bedtools bamtobed -bedpe, extract a few random lines (with shuf or rl) then convert back to bam with bedtools bedpetobam.
Does view -s actually break read pairs? From Samtools manpage: "Output only a proportion of the input alignments. This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate" It says it never keeps a read but not it's mate, which is a different way of saying that if it keeps a read, it keeps its mate.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What do you mean by that ? I used samtools for subsampling and it worked pretty well with the
-s
option. Of course you need to determine beforehand what fraction of your bam files corresponds to 1x, 2x, 5x coverage.My major problem is how to subsample one file multiple times, I saw many questions about subsampling multiple files, but I need to subsample one file many times.
For example, I want to generate 1x_subsampled_1.bam, 1x_subsampled_2.bam... 2x_subsampled_1.bam, 2x_subsampled_2.bam and so on.
Hello wjdgmlcjf ,
it is still unclear what's your problem. Just run the subsampling command multiple times with different parameters and your are done, aren't you? Or which point I'm missing?
fin swimmer
I am not sure if this is what you are looking for, but the -s argument does support multiple seed values:
-s FLOAT subsample reads (given INT.FRAC option value, 0.FRAC is the fraction of templates/read pairs to keep; INT part sets seed)
Changing the seed will result in a different (but reproducible using the same seed) set of random values, resulting in different bam files for the same chosen fraction.
Remark that
samtools view -s
may be absent in older samtools versions.I hope this helps a bit,
Youri
Yeah that's right, but I want to write in one script, not running command multiple times. Because I want to generate about 50~60 subsampled bam files.
Picard downsample ?
Get 25% of bam file :