I have mutated a gene using CRISPR-Cas9, amplified a ~ 150-bp PCR product and sequenced using Illumina MiSeq. I am now looking at the reads aligned to the amplicon reference sequence in IGV.
I do this routinely but I think I have never seen an example like this. The sample I am looking at has reads clipped at the expected double-strand break site. See IGV screenshot below. In the reference sequence (below), note the AGG in uppercase, this is the NGG PAM. We are expecting Cas9 to create a double-strand break a few bp upstream. The 11-bp deletion in the first two reads matches this. However, the following reads stop matching exactly at this position. The soft-clipped (non-matching) portion is my Forward primer binding site (ACAGCGTTCTGTTTGTAGCGA) followed by the first 9 bp of my amplicon (i.e. we would expect these 9 bp a good 70 bp upstream!). There is no strand bias, both Forward and Reverse reads look like this.
I do not know how to interpret this. Any ideas?
Your experiment mostly failed with just a couple of molecules squeaking though?
Actually, isn't that exactly the reads you would expect if there was a genuine 70 bp deletion? I just don't know why
bwa mem
creates a soft-clip instead of opening a 70-bp gap. What do you think?My situation seems similar to this question: Bwa Alignment: Gap Vs Soft-Clipped Sequences, I think?
My comment was based on this. So you think that there is actually a 70 bp gap and these two reads on top are an anomaly? IGV will downsample reads by default so you may not be seeing all alignments.
Can you tell us which aligner you used to do these alignments? If your aligner only did ungapped alignments then you would not see reads that will span that 70 bp gap?
Ah I see. These are zebrafish embryos injected with Cas9 at the single-cell stage. What you need to know is: I expect them to have a bunch of different variants, like somatic mutations if you wish. I typically get anywhere from 1 to ~ 40 different mutations in each embryo. This is caused by the embryo developing (genomes replicating/cells dividing) while Cas9 is still around to create double-strand breaks.
So I trust the 11-bp deletions, this is very usual for me. But that does not mean there isn't any other mutations. Actually, it's relatively infrequent that I get one unique mutation.
Aligner is
bwa mem
. Command isWhat do you mean by ungapped alignments? It's clearly capable of opening gaps otherwise I would never see deletions, like the 11-bp deletions (which I trust).
I've noticed this parameter in the
bwa mem
help-page (http://bio-bwa.sourceforge.net/bwa.shtml)That vaguely sounds like what I'm after, but this probable deletion is 60–70 bp, so not above the limit (I have never changed this setting, so should be at default i.e. 100). I could try changing the scoring but sounds a bit risky... For example decreasing gap open or gap extension penalty or increasing clipping penalty.
Is this clearer?
Edit: I have tried setting
-w
higher and setting-O
(penalty for opening a gap) lower or-E
(penalty for extending a gap) lower, nothing seems to change these reads. Only creates a big mess in the good reads.Having said this, I now noticed a bunch of similar reads, except that the soft-clipped portion is larger, messier and can't be easily interpreted as a big deletion. I still thinks that the fact that these reads are clipped at the predicted double-strand breaks probably means something but probably hard to interpret from short reads alone. I'll probably let it go for now, except if anyone has any new suggestions.
I don't know how long your reads are. If there is indeed a 70+ bp deletion then bwa may not be willing to put in that big a gap.
Can you try using a splice aware aligner like bbmap or STAR to see if you can find reads that align across that gap? If you reads are ending at that break (going into the adapter) then result will not change.