HI, I have paired end sequences, when I do QC on my forward sequences, it showed overrepresented sequence indicating that as an adapter. TruSeq Adapter, Index 9 5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCACGATCAGATCTCGTATGCCGTCTTCTGCTTG However, in the reverse sequences are totally good. The scenario is that usually we have to trim the overrepresented sequences. So I have tried to trim it by Cutadapt using similar to this command line cutadapt -a AACCGGTT input.fastq > output.fastq Cutadapt has removed these sequence but I got bad sequence and I lost huge data.
My question is that is this Ok about losing huge data. Also, where if did not trim the overrepresented sequence will affect my mapping later on?
If you have adapter in read 1, you should have it in read 2 also unless the run was asymmetric (R1 and R2 different lengths). BBDuk has an option (tpe) for trimming paired reads to the same length even if adapter sequence was only detected in one read, for this reason; the quality of read 2 could be so low that the adapter sequence is not obvious due to too many mismatches. And yes, trimming adapters is important for mapping; you will get higher mapping speed, higher percent mapped, and more accurate mapq generation with trimmed reads.
Thank you for your comments and what you have said is helpful. Both reads have really good qc report.
Have you checked to see if R1/R2 reads merge (i.e. if you have shorter than expected inserts)? You can check this with
bbmerge.sh
from BBMap suite.thank you for mention this and I will work in this today and I will respond on it.
Hi, Just to be sure, you have actually used the correct adapter sequence in the command line as noted here, and not the dummy (AACCGGTT) you have stated.
Thank you for your respond. yes used the actual adapter sequence.