Hi,
I am processing a batch of sequencing data, which comes from special library construction technology, and it may contain more adapter sequences than standard library data. I have been informed that the adapter sequences used in library construction are:
> P7_adapter
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
> P5_adapter
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
First, I tried to directly remove these sequences using cutadapt with params -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
, but it was not effective. FastQC shows that there are still a lot of Illumina universal adapters in R2
.
Then I tried using shorter adapter sequences -a AGATCGGAAGAG -A AGATCGGAAGAG
, and in this case, it worked very well. However, my fastqc
report shows that TruSeq Adapter, Index 7 sequences are overrepresented. I'm not clear about the reason for this phenomenon and would like guidance on how to remove these sequences.
Hi, GenoMax
Thanks for you kindly help, basicly, we follow a enhance clip protocol (https://doi.org/10.1038/nmeth.3810), since the template has been digested, the fragment length may be shorter, so we think it is reasonable to find the adapter sequence at the 3' end and it has been removed by cutadapt. However, I am not sure whether the index sequence detected by FastQC is a false alarm, really exists, or can be ignored.