I grouped the samples into p53 wildtype and p53 mutated - for approximately 1000 individuals. I have gene expression data (logFC) of each individual present in both mutated and non-mutated groups. Now my aim is to identify the genes that are strongly upregulated under p53 mutation. I want to analyze the link between the mutation and the expression of the gene:
I am wondering what are the appropriate statistical tests for analyzing such relationship?
Should I perform the grouped analysis (all p53 wildtype vs all p53 mutated), or pairwise analysis (single mutated case vs single non-mutated case), then taking the average of the significant value of each pair for finding the associated genes?
Please note that my mutation data is in binary format (-1: mutation and 0: wildtype) and gene expression data as log FC. The row represents the gene name and columns represents the each sample data.
Any advice or pointers would be greatly appreciated.
Thanks in advance.
In the similar situation I would do DE analysis between wild-type and mutatnt samples. There are a lot to consider like how samples with mutations in genes with co-occurnce or mutually exclusive relationship with TP53 should be considered in this kind of analysis....
Update : 2021-10-13
See (this and this). They used
limma
package and a design matrix accounted for all interested variables like mutations to assess the effect of mutations on expression profiles.Can you say which type of data you have?
This is what I do not really understand.