I am currently doing my masters project about cardiac diseases.
So we took 14 patients and did rna seq and we got our results as FPKM. I will be using IPA software to create networks and pathways etc..
so my questions are :
What values of fpkm are considered to be significant ?
We have no control , so we will be taking patient 13 as the control beacause he has the least severe phenotype so how can I compare all patients fpkm to patient 13 ?(to know what genes are up/down regulated) I did a ratio between every fpkm for every gene is it correct ?
What cut-off should i consider ?
Results are in excel form as either expression profile g and expression profile G so whats the difference and which one should I take ?
What kind of master is it that you do? In bioinformatics? Or medical biology?
A good experiment starts with a good (experimental) design. Especially since you're 'learning', I hope your supervisors will give a good example of how sound research is performed (it will form the basis for your further career).
In a good experiment a control is as much as important as the patient itself. Ask your supervisors why they don't have any controls? Do you suspect your group of patients to be heterogeneous? Did you hierarchical cluster your patients? PCA?
For good statistical analysis with RNAseq data, you don't use FPKM values, but raw counts (as input for limma, edgeR, DEseq, etc.). So why did your supervisors want you to do analysis with FPKM data?
What helps is to read papers with RNAseq experiments. Especially the ones published in high impact journals will show you how to perform the analysis.
ADD REPLY
• link
updated 6.2 years ago by
Ram
44k
•
written 8.8 years ago by
Benn
8.3k
1
Entering edit mode
An update (6th October 2018):
You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:
The Total Count and RPKM [FPKM] normalization methods, both of which are
still widely in use, are ineffective and should be definitively
abandoned in the context of differential analysis.
The first thing one should remember is that without between sample
normalization (a topic for a later post), NONE of these units arecomparable across experiments. This is a result of RNA-Seq being arelative measurement, not an absolute one.
FPKM stands for fragments per kilobase of exon per million fragments mapped,i.e, FPKM is a expression unit which reports a probabilistic estimation of isoform abundance in RNA-Seq data. To measure the "significance", I'd attend to FDR or evalue parameters instead of FPKM.
For other hand, in order to classify the genes as UP/DOWN regulated you need to perform a comparison. In your case, you should compare your "cases" against the control, patient 13. To group the genes as UP/DOWN, you need to calculate the Fold Change, with the ratio of FPKM values of the case and the control. If the value is negative, means that the given gene is down regulated in the case in comparison with the control, and the opposite for a UP regulated gene. I usually give special attention to those genes having a FC <= -2 or FC >= 2, but it depends on you, your data and your goals.
First of all am not sure if this experimental approach is workable or not. It seems to me a bit flawed as you will be having one control against the rest. If it was time point RNA-Seq then also it would have made some sense. There are caveats in the approach. You can compare patient 13 with rest but then the statistical power will not be robust enough to give you any significant up/down genes. Any of the tools like limma,edgeR, DESeq2 will not yield fruitful result when you compare 1 against the other. Most of the tools which you raw count based data rely on comparing groups of one against the other and even it is not paired there should be minimum of 2-3 replicates per group to give it a statistical significance. You might average out rest patents fpkm per gene across all samples to have one fpkm and that can be then tried upon with patient 13, but there will be biases to that and also the result will not be that significant. But in any case to the Fold Change in this case will be calculated as patient 13 vs rest(avg. FPKM) and then associate a p-value to test the significance. But this entire will be flawed. Usually fpkm is normalized expression values which is used for visualization and comparison of gene expression downstream but for differential expression you must have raw counts. You can convert the fpkm to raw read counts as a crude approach , and then convert them to nearest integer and then try any DE tools mentioned above but I am sure it will not be that significant . So it is better get a bit details of the entire experimental approach. The approach for FC and FDR mentioned by @iarun is fine but it is really not a proper way to do the analysis. Infact @b.nota pointed it out correctly.
So here is what you should do,
Read RNA-Seq papers (methods, tools)
Read analysis of RNA-Seq (patient specific, experimental condition)
Read tools comparison papers
Read the normalization approaches used in RNA-Seq and sit with your supervisor to explain the caveats in the approach and then you might be able to design analysis protocol.
What kind of master is it that you do? In bioinformatics? Or medical biology?
A good experiment starts with a good (experimental) design. Especially since you're 'learning', I hope your supervisors will give a good example of how sound research is performed (it will form the basis for your further career).
In a good experiment a control is as much as important as the patient itself. Ask your supervisors why they don't have any controls? Do you suspect your group of patients to be heterogeneous? Did you hierarchical cluster your patients? PCA?
For good statistical analysis with RNAseq data, you don't use FPKM values, but raw counts (as input for limma, edgeR, DEseq, etc.). So why did your supervisors want you to do analysis with FPKM data?
What helps is to read papers with RNAseq experiments. Especially the ones published in high impact journals will show you how to perform the analysis.
An update (6th October 2018):
You should abandon RPKM / FPKM. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis:
Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
Also, by Harold Pimental: What the FPKM? A review of RNA-Seq expression units
Still we see papers publishing things in FPKM normalization. Alas!
Yep, even saw RPKM being used in a recent Nature Genetics publication. Obviously Nature Publishing Group needs to hire some statisticians.