I am using 151bp paired end RNA seq reads to study differentially expressed genes between two conditions. The reads are aligned to a reference transcriptome using Kallisto(index created using default kmer size of 31).
However ~16% of the reads have an adapter contamination with the adapter sequence starting in the middle of the read in some cases. The fastqc plot for adapter contamination look like this
I am using trim-galore to remove the adapter contamination however I am unsure as to what min length cutoff post adapter removal I should keep to optimize between preventing multimapping and losing reads.
I tested with 50 bp which results in loss of ~100000 read pairs(0.5%) and adds/removes ~30 genes from the significant list from DESeq2(total significant genes ~2400).
Kallisto works fine without the adapter removal too but I suspect it might result in spurious multimapping for reads which have very small >31 & <50 bp non adapter portion.
What would be an optimal read length cutoff in this scenario or how can I figure out the cutoff in this case?