I am working with the Illumina paired end unaligned data. I would like to initially identify the adapter sequences present in the data, and trim the reads accordingly. Is there are a way to identify the adapter sequences. Please assist me with this and let me know the tools to use.
If you have paired reads, and enough of the reads have inserts shorter than read length, you can identify adapter sequences with BBMerge, like this (they will be printed to adapters.fa):
bbmerge.sh in1=r1.fq in2=r2.fq outa=adapters.fa
You can find the adapter sequence used in the adapters.fa file included with BBMap. In that case, you can do this:
Thank you. I was able to identify the adapters in R1.fq and R2.fq. Now, I would like to know if these are 5' forward/reverse or 3' forward/reverse. Is there are way to identify.
fastp is a new tool that is almost as fast as bbduk but has implemented methods that automatically detect 5' or 3' adapters for both paired (must be manually enabled) and single-end data.
the adapters are evaluated by analyzing the tails of first ~1M reads
So if you have more complicated or multiple adapters this may not be ideal.
In the .html file, this only reports
Duplication rate
Insert size estimaion
Before/after filtering read quality
Before/after filtering base content
Before/after kmer counting
Thank you. I was able to identify the adapters in R1.fq and R2.fq. Now, I would like to know if these are 5' forward/reverse or 3' forward/reverse. Is there are way to identify.