Hi everyone,
I'm trying to do an analysis to see which parameters of the RNA-seq are better for my samples. To do that, I have 9 samples that I sequenced with 20M reads, 150bp and paired-end. I wanted to see which parameters generate more accurate results (paired-end vs single-end, 150bp vs 100bp vs 50bp and 20M vs 10M reads).
To do that, I did a hard-trimming (to obtain 100bp and 50bp), taken only the first 10M reads instead of the 20M and taken only the first fastq (to generate the single-end) and run the same programs: trim_galore for the trimming, STAR for the mapping and RSEM for the quantification, all with the same parameters (only changing the parameters regarding the paired-end).
The results are that beginning with 50bp reads generates more transcript counts than beginning with 150bp reads, and that single-end generates more transcript counts than the paired-end. I'm a bit concerned, since I don't know how single-end reads could generate more transcripts than single-end, and I think I'm analyzing something wrong, do you know how could I do this type of analysis?
Thank you all very much, Lluc
Yes, I was trying that in order to establish what should be the parameters for the next sequencing experiments and if we could reduce the depth or use shorter reads in order to reduce costs. Sorry if I didn't explain myself.
Ok, I see. Then I suggest to use the PE-150-20M data as the "gold standard" and compare everything else to it as this is the best possible combination.
When it comes to any method that "counts fragments" paired-end reads add substantially to the cost.
That is because at the same coverage, single end reads will sample twice as many fragments, and the statistical power increases accordingly.
When you use paired-end reads, the same fragment is read twice (from each end), thus you lose the independence of the measurements.
In general, and considering the realities of funding and resource availability, I would only recommend paired-end reads when identifying novel transcripts is of importance.
For all other cases, generally speaking, the cost/benefits of paired-end reads do not materialize.
The quality of the RNA and the library preparation will have a larger effect, and usually you can get excellent results with 100-150bp single-end reads.