I have 30 tumor and 3 normal samples. MDS plot looks like this Tumor vs Normal I have used edgeR and selected differential expressed genes based on Fold change greater than 1.2 and FDR < 0.05. Differential analysis between tumor and normal gave only two upregulated genes.
So, I thinking to apply random selection of samples. Selecting random samples from tumor condition and do differential analysis with that and repeat the process n
times. This gives different set of genes differentially expressed in different analysis.
But not sure how to select final differentially expressed genes because same gene can be differentially expressed in different analysis with different fold change and fdr values.
1) Do you think applying subsampling for this a right choice? If not when subsampling can be applied?
2) As I get only 2 upregulated genes with FC > 1.2 and FDR < 0.05, can I increase the FDR to 0.5 or 0.1 to get more upregulated genes? Is selecting genes based on FDR < 0.5 or 0.1 a right choice?
Looks like your tumor samples are widely different from each other. So the result of just two genes is correct. Those are the two that are consistently different between the normal/tumor condition. Any other genes may not be distinct in some group of your tumor samples.
Yes ofcourse I know that only two genes are consistently different between the normal/tumor condition. But what I asked is can I increase FDR cutoff from 0.05 to 0.1 to get more differentially expressed genes? Or should I apply subsampling method?
Yes of course increasing FDR cutoff from 0.05 to 0.50 will give you lots more differentially expressed genes. I think the random sub-sampling is going to give uninterpretable results. Maybe try hold-one-out, where you run 29 vs 3 with each sample held out, creating 30 different result sets, and see how many genes are commonly found. Probably just your same two will show up consistently, but with this data you could talk about N genes found in 90% of "29-selections".
If I try this hold-one-out procedure I may get genes common genes and those common genes may have different fdr and foldchange values in different result sets right. From that how can I select the right one with values. Like for example see the following:
In one analysis
In another analysis same gene found differentialy expressed having different values
So from this two analysis I can take that gene as commonly found but which value should I consider?
Theyre both true in different ways. "The LFC is >6"
So, I should select common genes from different result sets based on a cutoff? And may I know when I can use subsampling procedure?