Entering edit mode
6.3 years ago
NB
▴
960
Hello,
I have a CNV pipeline using DeCON(variation of ExomeDepth) for clinical analysis. We now need to validate it and one of the steps is downsampling.
Just to make sure, what is the best approach/tool for downsampling BAM files (2 runs NextSeq x 48 samples each) and then running the CNV tool to check at what % @ 20x do the confirmed CNVs (by MLPA) are being picked up or dropped out ?
Many thanks,
-s FLOAT Output only a proportion of the input alignments. This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate. The integer and fractional parts of the -s INT.FRAC option are used separately: the part after the decimal point sets the fraction of templates/pairs to be kept, while the integer part is used as a seed that influences which subset of reads is kept. When subsampling data that has previously been subsampled, be sure to use a different seed value from those used previously; otherwise more reads will be retained than expected.
Keeping mate-pairs intact - not really an issue for ExomeDepth since it's a coverage based CNV caller, but might be needed if you're comparing to other CNV callers.
so does this means "S" sets the seed of 0 and only 10% of the reads by samtool ? I was going to to use sambamba for downsampling but I guess they are both similar - PE mates are lost, I think on downsampling
From what the samtools documentation says I think it does keep the mate-pair (just didn't read it first time!) - "This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate".
For Samtools view -s 42.2, should be a subsampling seed of 42 and 20% of the reads
For Sambamba -s, --subsample=FRACTION subsample reads (read pairs) --subsampling-seed=SEED set seed for subsampling
Thanks Both. Another quick question, is there a way to select the seed or is it totally random ?
The first part is the seed
samtools -s 8.1
does downsampling to 10% with seed "8"