Hi there.
I have 3RAD data for studying SNPs. I'm quality-checking the reads, and found a very small quantity of adapter contamination, mainly Illumina universal adapter (FastQc says <0.1% (not even a warning), Process_radtags outputs 0.77% with 2 possible missmatches).
As I understand, I should not have, in theory, adapter contamiantion, as insert size is much longer than read length (around 550 bp vs 150bp). Should I still remove this small quantity of adapter sequences, assuming they are stranded small inserts that somehow passed size selection or should I just consider them false positives? What would be your usual approach to this situation?
Cheers.
Use
bbduk.sh
(GUIDE) to scan and trim. It will remove adapter contamination down to a single base since it has a overlap mode for paired-end reads.While insert size is larger than read length there is generally a normal distribution in your library so you will have some short inserts in your library (unless something was done to size select).
Thanks for the quick answer!
I was going to exclude them directly with process_radtags (stacks), as I cannot trim them, I need same-length reads for posterior analysis, but I think you are right about the small quantity of small inserts, most of the adapter contamiantion occurs in the last parts of the reads.
If you are going to use
stacks
then that should work.