Exclude reads soft-clipped > N bases in Freebayes
1
0
Entering edit mode
7.3 years ago
nanana ▴ 120

I'm running Freebayes as follows:

freebayes -f genome.fa --pooled-discrete --genotype-qualities --read-mismatch-limit 70 \
tumour.bam normal.bam -v tn_raw.vcf

I want to filter out reads that are soft clipped (aligned with bwa mem) > 70 bps, and I'm using the --read-mismatch-limit 70 option to achieve this. However, in the output I see SNPS being called from reads with > 80 bps mis-aligned (see linked screenshot):

Is this option doing what I think it is? Is this the correct option to exclude reads with more than N mismatches?

next-gen snp • 2.7k views
ADD COMMENT
1
Entering edit mode
7.3 years ago

There is a terminology issue here. In NGS analysis parlance soft clipping is not the equivalent to mismatching bases.

The soft clipped region are not aligned for matching or mismatching regions - these are removed altogether as the alignment score is better when the region is not present. So their alignment is not reported. You could have matching bases within a soft clipped region.

Therefore in general, in my opinion, the soft clipped regions of reads should not be used to call SNPs at all.

Hence I think this flag refers to the actual mismatching bases and not the soft clipped ones.

-U --read-mismatch-limit N
         Exclude reads with more than N mismatches where each mismatch
         has base quality >= mismatch-base-quality-threshold.
         default: ~unbounded
ADD COMMENT
0
Entering edit mode

OK makes sense - do you have any suggestions for how to exclude soft-clipped reads (other than removing them from the BAM in the first place)?

ADD REPLY
1
Entering edit mode

If you want to exclude soft-clipped reads, then why not remove them from the bam?

ADD REPLY
0
Entering edit mode

Because I need them included in another analysis, and I'd prefer not to have to take up twice as much space by filtering them just for Freebayes.

ADD REPLY
0
Entering edit mode

can you please test if freebayes can read sub-bash file ? something like

freebayes -f genome.fa --pooled-discrete --genotype-qualities --read-mismatch-limit 70 \
 <(samtools view -bu tumour.bam) <(samtools view -bu normal.bam) -v tn_raw.vcf

just to know if you can put an inline filter in your command

ADD REPLY
1
Entering edit mode

Nice idea, but this doesn't work. I get the error: Failed to load index for /dev/fd/62. Rebuild samtools index

ADD REPLY

Login before adding your answer.

Traffic: 2541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6