Entering edit mode
8.2 years ago
sacha
★
2.4k
I get 16sRNA sequencing from illumina MiSeq. I Run fastQC and I get the following quality plot.
Why the first 6 position has the same quality. Is it the barcode not removed ?
And when I look into my fastq , the header specify a sequence : TAATGCGCCCTATCCT . is it the barcode too ? And why they are 17 nucleotides ?
@HWI-D00473:173:H3KJTBCXX:2:1101:11770:2101 1:N:0:TAATGCGCCCTATCCT
CCGTCAATTCATTTAAGTTTCATACTTGCGTACGTACTCCCCAGGCGGATTACTTATCGCGTTAGCTTGGGCGCTGAGGTTCGACCCCCAACACCTAGTAATCATCGTTTACGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTACCCACGCTTTCG
CGCTTTAGCGTCAGTATCTGTCCAGTAAGCTGGCTTCCCCATCGGCATTCCTACAAATACCTACGACTTTCACCTCTACCCTTGTAGTTCCGCTTACCTCTCCAGTACTCTAGTCATCCAGTTTCCAACGCAATACCGAGT
That is the barcode sequence. If this was a 2D Barcode then the two barcodes are concatenated and represented as a single string. e.g. TAATGCGC-CCTATCCT (those would be the two barcodes).
Initially illumina assigns a lower quality value for bases as the Q-scores are calibrated which is what you are seeing at the beginning of the read. That has nothing to do with the barcodes.
Thanks to make it more clear. So I don't need to trim the first 6 bases ?
No you don't need to.
Regarding the following plot, it seems the beginning of the reads has exactly the same content. CGTCAT. Is there any relation with the first 6 high quality first base?
Is this experiment using an internal barcode that appears at the beginning of the reads (possible based on the plot above or otherwise this may be due to bad experimental design) then you would see something like this. This barcode can then be used for demultiplex the reads further.
See Here