Hi,
I used trimmomatics to trim my Illumina Hiseq reads with a list that I downloaded from here.
But after I assembled the trimmed reads, I tried to upload the assembly to the TSA database in NCBI, they gave me the error saying that my sequence is contaminated by primer sequences. I found one of the contamination sources using vector screen, which is CCCTACACGACGCTCTTCCGATCT
. But this sequence is actually contained in one of the adapter sequences in the list:
>TruSeq_Universal_Adapter
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
So my question is why the sequence is not trimmed off by trimmomatics?
Is there anyway to remove the contamination from the assembly. So I don't have to go back to reassemble the sequences?
Is there any script or program to remove the contamination from the assembly?
Your assembly is very likely incorrect.
The adapter contamination in two reads will provide a common region that may lead the assembler to join these two into a single contig. Thus potentially connecting regions that are not actually adjacent.
You will need to filter out the reads that contain contaminants by aligning the reads against these contaminants, removing reads that aligned and reassembling with the remaining reads.