Hi guys, Had a quick question RE, an over-represented sequence from human transcriptomic data. I've already trimmed for adapters and aligned to the genome with star - After running a fastQC on one of the BAM files I have a warning about an over represented sequence "GGTGGCGCGTGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGTGGGAGGA" Comes up just over 36000 times and after a quick google search I have seen it come up in a few other instances online. After a blast search I found that it shares 100% homology with a few regions, such as, "Homo sapiens RNA component of signal recognition particle 7SL2 (RN7SL2), small cytoplasmic RNA"
Anyone know what might be happening and if it is going to be a problem going forward?
Thanks in advance!
Since it only shows up 36000 times, out of I'm assuming 30 million+ reads, I wouldn't worry about it yet. Go on to alignment and see how it looks.