I have the following read stored in trim.fq
@read
NGTTTGATTTGGGAAAGAAGGAAAGGAGAGGAGGGGAGGGGAGGGGAAGGGGGAGGGGAAATGCGCAAAAGATCGGAAGACATGACAAGTCAAGC
+
#4:BDFFFHHHHHIJJJIJJJIIJIJHIGHIGHJJJDFHII8AHFDAABDDDD5<BDD<@BCCCB@BDDDC@ACDD@B<A<AAA@@CDA>ADDC3
The following causes trimming to occur successfully:
cutadapt -a AGATCGGAAGAGC -e 0.1 trim.fq
However, the following does NOT result in the trimming:
cutadapt -a AGATCGGAAGATT -e 0.1 trim.fq
Why is this? In both cases, only the sequence AGATCGGAAGA is matched...
try raising error rate to 0.2 and it works. Try calculating allowed mismatches in your index sequences (e of 0.1 vs 0.2) . Trimming alone with AGATCGGAAGA also works.
Honestly that is kinda alarming that it clips in only one of the two cases with the same parameters and what seems to me to be two mismatches at the 3' end in both cases.
Does it trim the whole "AGATCGGAAGACATGACAAGTCAAGC" fragment in the former case?
EDIT: See my answer below but "errors" for cutadapt are not just mismatches which explains the behavior
Not two mismatches. One mismatch. In first case (AGATCGGAAGAGC) only one mismatch, which is why it works. In later case (AGATCGGAAGATT), there are two mismatches which is why the adapter is not getting trimmed. If you raise allowed mismatches to two, second one works.
Mismatch calculation:
Remove common sequence:
Because of this, with e of 0.1 or 1, sequence gets trimmed with sequence 1, not with sequence 2. If one increases error rate to 0..2 (0.2 x 13 = 2.6 and floor is 2) or 2, sequence gets trimmed with sequence 2 (and sequence 1) as well. If we take another sequence two non-matching bases like sequence 2 at the end (AGATCGGAAGAGT and sequence 2: AGATCGGAAGATT), cutadapt needs e 0.2 like sequence 2.
However, cutadapt doesn't allow mismatches in short subsequences of the primer sequence.