I used HISAT2 and StringTie2 to analyze RNA-seq , and finally obtained the FPKM for per gene. Then I made a boxplot for gene expression in all samples. Unfortunately, I found that the mediate value across samples was not identical, so do I need to do quantile normalization for FPKM. I also checked this post (Rnaseq Fpkm Quantile Normalization) and found that it was not necessary to do quantile normalization.
Thanks in advance,
best,
FPKM is not recommended for any kind of differential analysis between samples. Get the raw counts after aggregating to the gene level (e.g. using tximport) and then normalize with proper approaches such as TMM from edgeR or RLE from DESeq2. The vignettes and papers cover the topic of normalization for various purposes. vst or rlog from DESeq2 are other popular choices depending on the application. For information why FPKM is not recommended, please use google and the search function and see benchmarking papers comparison normalization methods.
Thanks for your reply!
I did not use FPKM to perform differential expression analysis. What I want to do is to compare the global expression levels between 2 groups. As I found the global expression values from the replicates were not identical, I am not sure whetehr I need to quantile normalize the FPKM.
best,
I would simply use a proper normalization technique starting from raw counts such as the one I suggested above. The easiest is probably vst from DESeq2 followed by the exploratory plots you mention. I see no reason to bother yourself with two different methods such as FPKM and QN as you would need to show that FPKM-distributed data do not violate any of the QN assumptions. Save yourself the trouble and use something else.
Thanks for your reply! I did not use FPKM to perform differential expression analysis. What I want to do is to compare the global expression levels between 2 groups. As I found the global expression values from the replicates were not identical, I am not sure whetehr I need to quantile normalize the FPKM. best,
I would simply use a proper normalization technique starting from raw counts such as the one I suggested above. The easiest is probably
vst
fromDESeq2
followed by the exploratory plots you mention. I see no reason to bother yourself with two different methods such as FPKM and QN as you would need to show that FPKM-distributed data do not violate any of the QN assumptions. Save yourself the trouble and use something else.