Content of T was larger than others in NSR-RNAseq data, why?
1
0
Entering edit mode
8.1 years ago
wm ▴ 570

I found a strange base content in an NSR-RNAseq data (SE60), T% was higher than all others (A.C.G), Can anyone tell, what's wrong with the data set?

The fastqc output (default parameter) were shown below:

Add: This data set was generated by Illumina HiSeq2000

Clip reads (TruSeq adapter Index), discard the 5 bases at the 5' end:

cutadapt -a  AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -q 20 -m 20 --cut=5 -o out.fq in.fq

Raw

enter image description here

Clip adapter from the 3' end

enter image description here

RNA-Seq • 1.7k views
ADD COMMENT
0
Entering edit mode

Could you elaborate on the experimental procedure to generate this library?

ADD REPLY
0
Entering edit mode

It is an NSR-primed whole transcriptome cDNA library, you can find the details here: http://www.nature.com/nmeth/journal/v6/n9/fig_tab/nmeth.1360_F1.html

enter image description here

ADD REPLY
0
Entering edit mode

Is your data Illumina? Illumina has know issues at the 5' end resulting in biased nucleotides. Depending on how the sample was processed, this could be the result of Nextera tagmentation bias (e.g. Fig4 here DOI: 10.1186/s12859-016-0976-y). Alternatively can be caused by not so random, random hexamers (http://seqanswers.com/forums/showthread.php?t=11843). Your first figure looks rather extreme though, like WouterDeCoster said - how was the library generated?

ADD REPLY
2
Entering edit mode
8.1 years ago

Strand-specific mRNA-Seq that contains a substantial amount of poly(A) contamination can produce plots like this one. It's often indicative of degradation of the RNA sample.

ADD COMMENT

Login before adding your answer.

Traffic: 1756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6