Threshold of removing genes with low counts ?
2
3
Entering edit mode
7.6 years ago

What should be the threshold of removing genes with low counts in RNA-seq data ? i have removed all genes with sum of counts 0, but there are many genes whose sum of counts are 1 , 2 ,3 etc. should i remove those ? or removing them will effect my results ?

RNA-Seq • 10.0k views
ADD COMMENT
0
Entering edit mode

If your question is about differential expression analysis this should help you answering the question. Else try to be more precise on what you want to do.

https://support.bioconductor.org/p/63234/

It depends which differential expression algorithm you are using. DESeq2 seems to deal with this problematic for you (EdgeR too I suppose?) but if I remember well you'll have to it manually with limma (this need to be confirmed).

ADD REPLY
0
Entering edit mode

i am using edgeR. i will also use DESeq2 for the comparative analysis, but first i need this thing solved for edgeR

ADD REPLY
0
Entering edit mode

So if you are using edgeR I advice you to read this post:

https://www.biostars.org/p/93553

ADD REPLY
0
Entering edit mode

Thanx alot for your kind help

ADD REPLY
1
Entering edit mode
7.6 years ago
firatuyulur ▴ 320

I do not think there is a certain threshold. When it comes to drawing a plot out of counts, the zero lines do make your computer suffer as there will be a couple of thousand 0 lines that I remove them directly. While analyzing, you will use differential expression, and there will be a p-value / p.adj-value where a count going from 0 to 1 across control and treatment samples will be far less significant than a count going from 100 to 500. My point is, either you remove them manually or not, your end point will be filtering based on the significance of the expression change and most of the small numbers will be filtered out.

ADD COMMENT
2
Entering edit mode

I don't think your answer is entirely correct. It is true to say that 0-1 counts won't be significant with programs such as DESeq/EdgeR. But on the other hand, if you leave them in the analysis, you will increase the adjusted pvalues leading to potentially less significant results on the 100-500 read genes (when you adjust for multi-testing, the more tests you have the more stringent it is). So filtering can impact your final results.

ADD REPLY
0
Entering edit mode
5.6 years ago

This may be of some interest.

Filtering and collapsing data

ADD COMMENT

Login before adding your answer.

Traffic: 1890 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6