Entering edit mode
8.1 years ago
wm
▴
570
I found a strange base content in an NSR-RNAseq data (SE60), T% was higher than all others (A.C.G), Can anyone tell, what's wrong with the data set?
The fastqc output (default parameter) were shown below:
Add: This data set was generated by Illumina HiSeq2000
Clip reads (TruSeq adapter Index), discard the 5 bases at the 5' end:
cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20 -m 20 --cut=5 -o out.fq in.fq
Raw
Clip adapter from the 3' end
Could you elaborate on the experimental procedure to generate this library?
It is an NSR-primed whole transcriptome cDNA library, you can find the details here: http://www.nature.com/nmeth/journal/v6/n9/fig_tab/nmeth.1360_F1.html
Is your data Illumina? Illumina has know issues at the 5' end resulting in biased nucleotides. Depending on how the sample was processed, this could be the result of Nextera tagmentation bias (e.g. Fig4 here DOI: 10.1186/s12859-016-0976-y). Alternatively can be caused by not so random, random hexamers (http://seqanswers.com/forums/showthread.php?t=11843). Your first figure looks rather extreme though, like WouterDeCoster said - how was the library generated?