Down sampling BAM files for CNV detection
1
1
Entering edit mode
6.3 years ago
NB ▴ 960

Hello,

I have a CNV pipeline using DeCON(variation of ExomeDepth) for clinical analysis. We now need to validate it and one of the steps is downsampling.

Just to make sure, what is the best approach/tool for downsampling BAM files (2 runs NextSeq x 48 samples each) and then running the CNV tool to check at what % @ 20x do the confirmed CNVs (by MLPA) are being picked up or dropped out ?

Many thanks,

dowsampling BAM CNV NGS • 2.8k views
ADD COMMENT
4
Entering edit mode
6.3 years ago

I would suggest samtools -s 0.1 yourfile.bam -o newfile.bam for downsampling to 0.1x and adapt accordingly.

ADD COMMENT
2
Entering edit mode

-s FLOAT Output only a proportion of the input alignments. This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate. The integer and fractional parts of the -s INT.FRAC option are used separately: the part after the decimal point sets the fraction of templates/pairs to be kept, while the integer part is used as a seed that influences which subset of reads is kept. When subsampling data that has previously been subsampled, be sure to use a different seed value from those used previously; otherwise more reads will be retained than expected.

ADD REPLY
0
Entering edit mode

Keeping mate-pairs intact - not really an issue for ExomeDepth since it's a coverage based CNV caller, but might be needed if you're comparing to other CNV callers.

ADD REPLY
0
Entering edit mode

so does this means "S" sets the seed of 0 and only 10% of the reads by samtool ? I was going to to use sambamba for downsampling but I guess they are both similar - PE mates are lost, I think on downsampling

ADD REPLY
1
Entering edit mode

From what the samtools documentation says I think it does keep the mate-pair (just didn't read it first time!) - "This subsampling acts in the same way on all of the alignment records in the same template or read pair, so it never keeps a read but not its mate".

For Samtools view -s 42.2, should be a subsampling seed of 42 and 20% of the reads

For Sambamba -s, --subsample=FRACTION subsample reads (read pairs) --subsampling-seed=SEED set seed for subsampling

ADD REPLY
0
Entering edit mode

Thanks Both. Another quick question, is there a way to select the seed or is it totally random ?

ADD REPLY
0
Entering edit mode

The first part is the seed

samtools -s 8.1 does downsampling to 10% with seed "8"

ADD REPLY

Login before adding your answer.

Traffic: 2575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6