Question

TruSeq Illumina adapters are BLASTed with a high confidence to some genes/terms

1

Entering edit mode

2.2 years ago

e.r.zakiev ▴ 250

A FASTQC report shows an overrepresented sequence defined as "Truseq adapter" enter image description here but when I BLAST its nucleotide sequence (GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTCGGAGCATCTCGTAT)

it aligns with high confidence to, say, coronavirus genes enter image description here Why is that? I was expecting to see the top hit saying something like "Illumina Adapter".

And also why the nucleotide sequence reported by FastQC as "truseq adapter 18" doesn't match it's namesake in one listed in Illumina's official document?????? it's not the same sequence as listed in the Fastqc report, believe me

I am asking this because i wanted to BLAST the overrepresented sequences in my data and see if they come from the ribosomal RNA contamination or not. And with results like THAT even for the adapter sequences I clearly don't understand something

RNAseq adapters Truseq BLAST • 759 views

ADD COMMENT • link updated 2.2 years ago by ATpoint 87k • written 2.2 years ago by e.r.zakiev ▴ 250

score 3 · Accepted Answer · 2023-02-16

The sequence you provide is the beginning of the Illumina adapter as in the document you link. I see no problem here. It is only 0.27% of reads, so why bother?

As for this BLAST search, I would first of all check that the genome assembly has no leftovers of adapter sequences. But again, it is way less than 1% of sequences, just ignore and continue with analysis.