Hi, I have a problem during QC and would like you to give me a hint.
I studied how to remove adapter sequence from 5' end and 3' end,
But I don't know the way to get rid of technical reads (not adapter) that contain known sequence.
For example, there following read in raw fastq file.
@id_01 length=62
GACTACGTACA**GAACAGATAATGACCATTTATAC**CGGAACAAATGGTTATCTGGATGGATTA
+id_01 length=62
IIIIIIIIIICCCFFFFFHHHHHJJJJJJJJJ<HHIJJJJJJJJJJFHIJJJJJIJJJJJJJ
The GAACAGATAATGACCATTTATAC
sequence is generated from vector plasmid.
So I want to get rid of this read from all read before high-dimensional analysis.
Anyone has solution?
I'm sorry that I'm beginner in bioinformatics and my clumsy English.Thanks.