Question

How to do filtering of expression values for miRNA data analysis

1

Entering edit mode

5.0 years ago

nkabo ▴ 80

Hi everyone,

I am working on miRNA analysis. Since I have been added to the study lately, I do not have the raw data. The study is composed of 2 groups. I only have an excel file composed of transcript ID of miRNAs, Bi-weight Average Signal from Control group, Bi-weight Average Signal from Disease group and the Fold Change among these two groups. I do not have any p-values or results of statistical tests to determine whether these values are significant or not, therefore I cannot say which ones might be from experimental errors or not. I have to obtain the significant miRNAs and make a functional analysis based on these differentially expressed miRNAs.

I have two questions:

Could I apply any statistical tests to determine which values are worth taking into account? Could you suggest me a filtering method in this case?
If there is no filtering method based on statistical tests, is it possible to have a filtering such as:
- Obtain the mean value of Fold Change values,
- Set a threshold (mean +-1 for example) for Fold Change values
- Eliminate the IDs with too low or too high Fold Change values

Thanks in advance!

mirna bi-weight average • 1.0k views

ADD COMMENT • link 5.0 years ago by nkabo ▴ 80

1

Entering edit mode

Do yourself a favor and insist on the raw data. Custom approaches only make it unnecessarily difficult, require more validation work and are eventually inferior. All your suggested strategies are not intrinsically wrong but each has flaws which is why dedicated statistical frameworks exist. Get raw data!

ADD REPLY • link 5.0 years ago by ATpoint 88k

0

Entering edit mode

Thank you for your reply. I tried to get the raw data, but there is a problem about the company and I could not receive it. Is there a better way to have a meaningful result from that excel format?

ADD REPLY • link 5.0 years ago by nkabo ▴ 80

1

Entering edit mode

Not really. If I get you right then you only have one value for group1 and one value for group2 plus the fold change. This is not compatible with any meaningful statistical test as you cannot estimate how variable the replicates are and therefore you cannot estimate any p-value. I do not know what bi-weight averages are, so cannot comment here. Sure you can use the FC as a proxy for differential expression but this will give plenty of false results. I have nothing to add here, it is an unfortunate situation and I would really poke the company or whoever has the raw data until you get them. Remember that when you publish a study raw data have to be published with it, so if they are unavailable it is questionable if you can really (or should) use these data here productively.

ADD REPLY • link 5.0 years ago by ATpoint 88k