Inconsistency between Microarray and RNA-Seq results (Both were performed in the same group of patients)
1
0
Entering edit mode
2.8 years ago
Júlia • 0

Hi everyone! I have a question concerning gene expression profiles between RNA-Seq and Microarrays. Using the same group of patients, we found the superexpression of a gene in Microarray analysis and with the edgeR package in RNA-Seq, but not in DESeq2.

We wonder what can be causing these different results. Any guesses? Thank you!

Microarrays RNA-Seq • 1.2k views
ADD COMMENT
1
Entering edit mode

Some context would help. E.g. a plot of counts. Or the counts itself. If the difference is already in RNA-seq between analysis frameworks that work very similarly then that result is probably not a top hit but either borderline significant or with large standard error. Please show some data.

ADD REPLY
0
Entering edit mode
2.8 years ago
Papyrus ★ 3.0k

Do you that mean you detected a differentially expressed gene in microarrays (with some method), and then with RNA-seq data from the same samples, you detected it as differentially expressed with one method (edgeR) but not another (DESeq2)?

IMHO that is actually closer to being a validation than not. There are a million reasons why you could see discrepancies between methods, even for the same samples, including, for example:

  • the actual sample (RNA) being from the different aliquots, or being processed at different time points
  • the 2 techniques being totally different, in the experimental sense and in the data analysis sense
  • the statistical methods used by the different softwares being different
  • the label of "statistically significant" which we assign to any gene being an arbitrary cutoff
  • biological and technical noise which you will always carry when performing a technique
  • the differential gene detected having a small change, or your dataset having low power (low number of samples), so that the aforementioned noise will have more impact
  • etc.

Because of this, do not expect every gene to validate between 2 different techniques, even on the same samples, even on the same aliquot, etc. (even often doing the same technique two times over!). Moreover, I would think that if the gene (or set of genes) displays a very similar trend of change even if it does not reach statistical significance by a certain method (which we also choose arbitrarily), that could often count as a validation. So IMO the focus should be more in reproducing the trends of change, especially for your sets of genes with the strongest changes, even though sometimes specific genes will not validate for some reason such as the explained above!

ADD COMMENT
0
Entering edit mode

I think that one of the questions is, indeed, the low sample size. We have four individuals in the case group and six controls. Regarding the expression of one gene of interest to the pathophisiology of the disease, we found these results in RNA-Seq: edgeR(log2FC = 3.022 and p-value = 0.0049) and DESeq2 (log2FC = 3.069 and padj = 0.15). In both cases the gene is upregulated (small change), but the padj in only significant in edgeR. The trends of change are consistent in the other genes as well, so I think that both packages works similarly in this case, right?

Thank you ATpoint and Papyrus for the answers!

ADD REPLY
1
Entering edit mode

edgeR(log2FC = 3.022 and p-value = 0.0049)

(I'm guessing you mean padj = 0.0049)

Yes, as I said there are many reasons to not expect exactly the same results between different methods even for the same data. If you want to confirm that the 2 methods are in agreement, you could, for example, look at how many of the top 100 or top 1000 genes (ordered by significance) are shared between the 2 methods. Probably you will see a relevant overlap. More advanced options could be using Robust Rank Aggregation methods which are especially designed to compare lists of genes from different methods/inputs and tell you "more or less" if they are in the same positions.

As ATpoint pointed out, in this case your gene is not "super" significant even for edgeR (0.0049 is not a very low adjusted p-value). It is easier for borderline genes to be missed or detected by one method or the other. For particular genes you will understand this better if you look at their counts, etc., as ATpoint suggested, so that you could see high variance between the groups, etc, which will help you understand why there wasn't too much evidence to detect the gene as significant in the first place.

ADD REPLY

Login before adding your answer.

Traffic: 2334 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6