My first time uploading my de novo transcriptome assembly to TSA. I got flagged with Code(VECTOR_MATCH). I tried to trim the vector using bbduk and cutadapt, but I couldn't seem to make it work. For example, this adapter:
>gnl|uv|NGB00150.1:1-46 Ambion FirstChoice RLM-RACE 3' RACE adapter
GCGAGCACAGAATTAATACGACTCACTATAGGTTTTTTTTTTTTVN
It matches to my example sequence (BOLD) according to VecScreen.
> sample
GCAAAGAAGCATTTTGGCAAAAAATTGCGTAATATTCTGCCGTATGTTACTGCAATGTACACGTTTATAA
TTATTGTAATAAGAATGTCTCATATTGCCTGCTTGATGTGGCAGGGTCACTTGTCAAGTGAGGAAAAGTC
ACAGTGTGAGGACTGTCTATAAAAATTTAGGCATCATATTAAAATGTGTGGATGCCTTATTGTATAGAAT
ATTTCAAATTTTGCAAAATTTGAACAAAGCATATAAAATAAAAGGAACGAAATTGAAAAAAAAAAAAAAA
A**GTCGTATTAATTCTGTGCTCG**
My problem is: sequence trimmers could not recognize the adapter? I had to manually reverse complement the adapter and remove the "extra" sequences just to have that exact match on my sequence. Had it been 5 sequences, I can manually remove them; but, more than a thousand sequences were flagged. I am not sure what am I doing wrong. This pandemic is making me too exhausted to read more bioinformatics...
This worked. I used k=21. I am not sure why. But, thanks....
Please consider accepting this answer (green checkmark) to provide closure to this thread.