Question

RNA-seq's strange genes

0

Entering edit mode

8 hours ago

SeoG • 0

Hello I'm first anlysis in bulk RNA-seq.

I have multiple replicates of the same sample, and there are genes where the read counts vary by more than 10-fold. Should I filter out these genes?

Thank you

Rna-seq • 110 views

ADD COMMENT • link updated 58 minutes ago by Istvan Albert 102k • written 8 hours ago by SeoG • 0

1

Entering edit mode

More context is needed. What are the genes for? What is the range of read counts? Expression from 10 reads to 100 reads among replicates could be explained biologically. What is the variation in total number of reads per replicate?

Generally, I would say removing data because the variation doesn't agree with your a priori assumptions is a bad start, but it's impossible to say in this example without more information.

ADD REPLY • link 4 hours ago by dthorbur ★ 2.5k

score 1 · Answer 1 · 2024-11-26

The post made me think about just how many times one ought to find 10X fold change data in a realistic RNA-Seq experiment.

So I went ahead and simulated realistic RNA-SEQ counts with the PROPER library:

https://bioconductor.org/packages/release/bioc/html/PROPER.html

Published as "Wu H, Wang C, Wu Z (2014). "PROPER: Comprehensive Power Evaluation for Differential Expression using RNA-seq." Bioinformatics."

In my run with 3 replicates, out of 20K genes, 24 genes had a fold change of over 10x defined as abs(Avg(A)/Avg(B)).

9 out of 24 were false positives, and the rest were true positives.

In conclusion, I would let the statistical method sort it out and then investigate the results for those that seem unexpected rather than filtering out a priori.