Question

Statistical technique for mutation to gene expression link

1

Entering edit mode

3.2 years ago

SUMIT ▴ 30

I grouped the samples into p53 wildtype and p53 mutated - for approximately 1000 individuals. I have gene expression data (logFC) of each individual present in both mutated and non-mutated groups. Now my aim is to identify the genes that are strongly upregulated under p53 mutation. I want to analyze the link between the mutation and the expression of the gene:

I am wondering what are the appropriate statistical tests for analyzing such relationship?
Should I perform the grouped analysis (all p53 wildtype vs all p53 mutated), or pairwise analysis (single mutated case vs single non-mutated case), then taking the average of the significant value of each pair for finding the associated genes?

Please note that my mutation data is in binary format (-1: mutation and 0: wildtype) and gene expression data as log FC. The row represents the gene name and columns represents the each sample data.

Any advice or pointers would be greatly appreciated.

Thanks in advance.

expression gene statistics mutation association tumor • 1.2k views

ADD COMMENT • link updated 3.1 years ago by Hamid Ghaedi 3.3k • written 3.2 years ago by SUMIT ▴ 30

0

Entering edit mode

In the similar situation I would do DE analysis between wild-type and mutatnt samples. There are a lot to consider like how samples with mutations in genes with co-occurnce or mutually exclusive relationship with TP53 should be considered in this kind of analysis....

Update : 2021-10-13

See (this and this). They used limma package and a design matrix accounted for all interested variables like mutations to assess the effect of mutations on expression profiles.

ADD REPLY • link 3.1 years ago by Hamid Ghaedi 3.3k

0

Entering edit mode

Can you say which type of data you have?

I have gene expression data (logFC) of each individual present in both mutated and non-mutated groups

This is what I do not really understand.

ADD REPLY • link 3.2 years ago by ATpoint 85k

score 0 · Answer 1 · 2021-10-04

Differential expression with tools like limma, DESeq2, and edgeR in Bioconductor would help answer the question you are asking directly. The latter two are the better approaches for RNA-seq data, while the former is an approach for microarray data.

You do not have paired samples (where paired means arising from the same individual), so you cannot use a paired-sample analysis.