low expressed gene filtration, quantile normalization and log2 transformation, which one goes first?
1
1
Entering edit mode
8.1 years ago
ewre ▴ 250

Hi everyone,

I have been dealing with expression data for about 4 years (both microarray and rna-seq). but this question still confuses me when I do data preprocessing. 1) My opinion is that at least we should do low expressed gene filtration first. Reason is that: the aim for quantile/log2 transform is to make the data distribution more proper. but if quantile/log2 goes first and then followed by low-expressed gene filtration, we may break the distribution.

2) For log2 transform and quantile normalization, I really don;t know which one goes first.

Thank you in advance for your time and valuable suggestion.

log2 quantile data transform preprocessing • 3.7k views
ADD COMMENT
0
Entering edit mode

If you Remove low expressed genes first (across the samples/cohort) and then do log transform(FPKM + 1),the results should be fine.

ADD REPLY
0
Entering edit mode

So is this question about RNA-seq or microarray?

ADD REPLY
0
Entering edit mode

Hi WouterDeCoster, I want to make it as general for both RNA-seq and microarray.

ADD REPLY
0
Entering edit mode

RNA-seq and microarray are both transcriptomics, but that's the end of the similarities. Microarray are continuous intensities, RNA-seq discrete counts (sampled from a negative binomial distribution: overdispersed poisson distribution).

I'll leave microarray analysis for someone else, but most acceptable is for RNA-seq to use tools like DESeq2 and edgeR which model the data assuming this negative binomial distribution. So you don't want to preprocess the data here, because for the software to work optimally it expects raw, unmanipulated counts.

ADD REPLY
0
Entering edit mode

Thank you. exactly, for raw read count of rna-seq data, I usually use deseq2 and edger to do DEG analysis. but sometimes I have to go with only rpkm/fpkm data. that's where I get trouble.

ADD REPLY
0
Entering edit mode
8.1 years ago
Farbod ★ 3.4k

Dear hanguangchun, Hi.

I think removing low expression then -> log2 transform is more usual.

Also, please have a look at There are too many transcripts! What do I do?

and the IsoPct < 1 section of this paper for excluding the spurious transcripts.

~ Best

ADD COMMENT
0
Entering edit mode

Thank you very much for the information, Farbod.

ADD REPLY

Login before adding your answer.

Traffic: 2397 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6