Is It A Good Idea To Trim Rna-Seq Reads Or Just Remove N'S?
1
1
Entering edit mode
11.2 years ago
Rohit ★ 1.5k

Hi all,

I have some RNA-seq dataset of good coverage to analyse. I can see some N's and low quality towards the ends. In genomes I just try to remove the reads with N's above a certain cut-off and then I also trim the low quality ends.

Is it a good idea to remove the reads with too many N's in RNA-seq data, or will this affect the expression values at a large level when I go for the de-novo assembly followed by expression analysis?

And will trimming the ends of the reads due to bad quality affect the data too much in terms of expression?

rna-seq trimming filter • 6.3k views
ADD COMMENT
3
Entering edit mode
11.2 years ago

It is definitely advisable to trim RNA-seq reads if you are going to do de novo assembly. Sequencing errors and stretches of N:s will trip up the assembler (the de Bruijn graph will become bloated.)

If you are just going to map the reads to a reference, you could also trim, but it is less clear to me how much it affects the results. I usually trim my reads and as a consequence of that get higher mapping rates.

ADD COMMENT
0
Entering edit mode

Actually I have another data-set which I will use a reference genome for another model organism. So I guess I need to know the effect of trimming on reference-based too. Probably by working on my data with reference in two approaches.

I have to add that quality trimming was advised in reference-based approach too in one of the blogs I read just now. But the amount of data loss is not yet precisely quantified. http://www.researchgate.net/post/How_much_does_quality_trimming_reads_refine_RNA-seq_analysis

ADD REPLY
1
Entering edit mode

I observed some data loss in one data set in terms of splice junctions. That is, more reads were mapped in total after trimming, but fewer were junction spanning. (and these are often quite valuable) However, I didn't really check whether there was a lot of false positive junctions in the original untrimmed data.

ADD REPLY
0
Entering edit mode

I have to say that data loss in splice sites was the first problem I was addressing. If I miss some splice junctions, it will reduce the accuracy of the analysis to a large extent as that is what I want to avoid. I guess in the end there is no sequencing without data loss, and no data filtering avoiding further data loss. I guess our methods now are loops Nested with data-loss.

ADD REPLY

Login before adding your answer.

Traffic: 2644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6