Hey guys,
After browsing similar questions and trying to use the "friendly" tools available, I concluded that adapter removing is not trivial at all for non-expert users. At least not for some datasets. So I have a few questions, If you could help me with any of those it would be really nice.
- How do I know what adapters are present in my reads? (Fastqc report shows several hits with Illumina Multiplexing PCR primer 2.0.1, but clipping it's sequence won't clean all reads and reports will keep showing this contamination). Shouldn't I know the adapter just by knowing the library prep kit used?
- Why don't all reads have adapters?
- If I use Cutadapt with the first 13bp of Illumina universal adapter (AGATCGGAAGAGC) over half of my dataset is lost in clipping (20Gb to 9Gb). Also, Fastqc will still show adapter contamination. Can I trust this clipping?
I am using Adapter Removal. It identifies adapters on it's own. Also add quality filter, it's worth it.
Why not all reads ahve adapters? Beacause clipping them is part of the instrument software before you get your FASTQs
You can also run prinseq before and after Adapter Removal. By looking at sequences lengths, you should be left with only one peak. Also looking at duplications section gives insight about any adapters that may be present