Question

How to Identify Housekeeping genes from gene expression matrix and normalize the data

1

Entering edit mode

4.3 years ago

bioyas ▴ 20

Hi everyone,

I have a matrix of expression(counts) derived from 2 RNA experiments. I have combined the two experiments . I would like to normalize the data in away that at the end I have the expressions at the same scale and can do Differentially expression analysis.

What I have tried so far is using edgeR normalization method and limma package to remove the batches which failed to put the data in same scale.

Now I would like to try normalization with respect to House-Keeping genes which is new to me. I don't know how I can find those HKGs from the expression data. Is there any R package to detect those genes or I need to find them manually? If So, How should I do that?

The next step after finding the list of HKGs is the Normalization using this genes' expression where I need help too.

Thank you in advance

housekeeping genes normalization R edger limma • 2.0k views

ADD COMMENT • link 4.3 years ago by bioyas ▴ 20

0

Entering edit mode

Are these independent experiments? Maybe this answers some questions:

Basic normalization, batch correction and visualization of RNA-seq data

ADD REPLY • link 4.3 years ago by ATpoint 86k

0

Entering edit mode

Thanks for your reply. The tutorial is really helpful and what they are explaining is what I have already tried. The experiments were done separately and thats one of the reasons that they are not at the same scale.

I thought that housekeeping gene normalization might be the answer to my question but I am not familiar with it. Any insight on that is appreciated.

ADD REPLY • link 4.3 years ago by bioyas ▴ 20

0

Entering edit mode

Did you apply any of the strategy from the tutorial, such as using PCA to detect for batch effects? You cannot just combine different experiments. You have to check whether there are batch effects, and if so, then account for them.

ADD REPLY • link 4.3 years ago by ATpoint 86k

0

Entering edit mode

I did not apply the PCA part. What I did is normalization using edger and then using removeBatchEffect() from limma package to remove batch effects. After all of this drawing heatmap and hierarchical clustering still shows that the samples are separated based on experiments which means that I could not make them in same scale.

ADD REPLY • link 4.3 years ago by bioyas ▴ 20

0

Entering edit mode

This is why I linked this tutorial which explicitely mentiones that you cannot simply combine independent experiments. They are most likely confounded. For batch removal you'd need replicates of each of the experimental groups (say normal and cancer, or whatever your design is) in both batches (so in both datasets). If you don't have that then you cannot combine them because you cannot remove the batch effect. I suggest you perform PCA as described.

Can you give some details? What are the experiments, what are the groups per experiment and do you have replicates of each groups in both datasets?

ADD REPLY • link 4.3 years ago by ATpoint 86k

0

Entering edit mode

Thanks for your comment. I will try to do the PCA. Both data sets are coming from RNA seq experiments and I have several groups(samples) and there are 3 replicates for each sample in first data set and 4 replicates for each sample in 2nd data set.

As I said before when I use normalization I get two clusters based on 2 data sets. Which means that the there is batch between the 2 datasets.

ADD REPLY • link 4.3 years ago by bioyas ▴ 20