Question

Soft-Clipping And Variant Calling

1

Entering edit mode

10.8 years ago

DoubleDecker ▴ 180

Having aligned my reads with BWA-mem, I noticed that a lot of alignments are dodgy, to put it mildly, e.g. with both ends massively soft-clipped. I now wonder if I need to filter my alignments to get rid of such soft-clipped alignments before performing variant calling or (preferably!) variant callers such as GATK, freebayes consider soft-clipping information in deciding whether to call a variant?

bwa gatk • 11k views

ADD COMMENT • link updated 10.8 years ago by Mick ▴ 30 • written 10.8 years ago by DoubleDecker ▴ 180

Istvan Albert · Answer 1 · 2014-03-12

1

Entering edit mode

10.8 years ago

Pierre Lindenbaum 164k

No, soft/hard clipped alignments are ignored by the callers.

EDIT: but hard and soft clip can be used to detect structural variations.

EDIT2: a quick test (added 8 bases in 5' and 3 ' to all reads. Only 5 snps found)

ADD COMMENT • link updated 8.1 years ago by Istvan Albert 102k • written 10.8 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

That's really great, will save me a lot of disk space not to prepare yet another set of bam files.

EDIT . OK, I am scratching my head right now... so you only got only 5 additional SNPs having added an identical string to all the reads? But it seems to me you assigned low Phred quality values (2) to the added nucleotides, so SNP caller might be biased against them for that reason?

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 10.8 years ago by DoubleDecker ▴ 180

Ram · Answer 2 · 2014-03-12

0

Entering edit mode

10.8 years ago

Mick ▴ 30

Forgive me if I have this wrong, but....

What happens in bwa mem is that often one gets the primary alignment, and lots of much shorter secondary alignments, many of which are hard/soft clipped. These secondary alignments are essentially local/split alignments wrt to the read, and are often not what SNP discovery projects are interested in.

GATK ignores everything except the primary alignment.

However, where reads do not have a good, long primary alignment, I am guessing it's possible that the short/clipped/local alignment is marked as the primary alignment and is taken into consideration by GATK when SNP calling.

For example, if one has PhiX in the reads, but not in the reference database, then one can get short/clipped/local alignments of the PhiX reads against the reference genome. These are clearly false alignments. However, they may be marked as primary alignments, and GATK may use them to call SNPs.

I think that was the focus of the question....

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 10.8 years ago by Mick ▴ 30

0

Entering edit mode

Yes, that's what I meant. I did not use the option to output alternative alignments in BWA mem.

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 10.8 years ago by DoubleDecker ▴ 180

0

Entering edit mode

I'd go back and re-do alignments as per http://bio-bwa.sourceforge.net/

With BWA-MEM/BWA-SW, my tools are complaining about multiple primary alignments. Is it a bug? It is not. Multi-part alignments are possible in the presence of structural variations, gene fusion or reference misassembly. However, representing multi-part alignments in SAM has not been finalized. To make BWA work with your tools, please use option -M to flag extra hits as secondary.

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 10.8 years ago by Mick ▴ 30