Specific over-represented sequence FastQC

2

Entering edit mode

4.3 years ago

harrydolan.dc ▴ 20

Hi guys, Had a quick question RE, an over-represented sequence from human transcriptomic data. I've already trimmed for adapters and aligned to the genome with star - After running a fastQC on one of the BAM files I have a warning about an over represented sequence "GGTGGCGCGTGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGTGGGAGGA" Comes up just over 36000 times and after a quick google search I have seen it come up in a few other instances online. After a blast search I found that it shares 100% homology with a few regions, such as, "Homo sapiens RNA component of signal recognition particle 7SL2 (RN7SL2), small cytoplasmic RNA"

Anyone know what might be happening and if it is going to be a problem going forward?

Thanks in advance!

rna-seq fastQC • 859 views

ADD COMMENT • link 4.3 years ago by harrydolan.dc ▴ 20

1

Entering edit mode

Since it only shows up 36000 times, out of I'm assuming 30 million+ reads, I wouldn't worry about it yet. Go on to alignment and see how it looks.

ADD REPLY • link 4.3 years ago by rpolicastro 13k

Login before adding your answer.