Question

Differential Expression Analysis over time treatment

0

Entering edit mode

3.1 years ago

ShowPow ▴ 20

Hi everybody, I'm currently working on Rna-seq data and wanted to study how the gene expression changes during treatment time . This is the first time I do it, so I wanted to know if my steps are correct and ask a final question. I took 8 samples with Tumor and other samples of healthy people ( AC = 100 ) . Each of the Tumor samples is present in a follow up of 4 times. So in total I have 32 samples ( Tumor = 8 x 4 = 32 ) . What I've done is the following :

#Factor with 5 levels AC,GBM0,GBM1,GBM2,GBM3
group <- factor(samples$class) 

# fit data into a DGEList data class 
y <- DGEList(counts = counts, group = group)

# normalize counts using trimmed mean of M-values method
y <- calcNormFactors(y,method = "TMM")

# build the design matrix based on the group
design <- model.matrix(~ 0 + group)

# re-assign column names
colnames(design) <- levels(group)

# compute common, trend, tagwise dispersion
y <- estimateDisp(y,design = design ,robust = TRUE)

# fit the negative binomial GLM for each tag 
fit <- glmFit(y, design=design)

# build the contrast 
contrast <- makeContrasts(
  GBM0VsAC = GBM0-AC,
  GBM1VsAC = GBM1-AC,
  GBM2VsAC = GBM2-AC,
  GBM3VsAC = GBM3-AC,
  levels=colnames(design))

# perform likelihood ratio test for differential expression
lrt <- glmLRT(fit, contrast = contrast)
lrt.top <- topTags(lrt, n=nrow(lrt$table))$table

head(lrt.top)



    GeneENSEMBL logFC.GBM0VsAC logFC.GBM1VsAC logFC.GBM2VsAC logFC.GBM3VsAC   logCPM        LR       PValue          FDR
    ENSG00000103175       5.958471      2.4203915      1.6676587      1.9970883 3.357118 208.36483 5.971650e-44 2.679479e-40
    ENSG00000100060       1.054112      1.0377118      0.6542040      0.8685774 7.277566  94.56561 1.409715e-19 3.162696e-16
    ENSG00000114439       1.435828      0.4227201     -0.1761401     -0.0960050 6.858234  88.95804 2.191882e-18 3.278325e-15
    ENSG00000213672       1.785063      1.5803324      1.2577956      1.4061236 6.391914  84.90323 1.590392e-17 1.784022e-14
    ENSG00000167100       2.102274      1.9095356      1.2437504      1.4472331 7.075963  80.22773 1.558682e-16 1.398761e-13
    ENSG00000145425      -2.324358     -2.3476719     -1.9487018     -1.6620392 9.519217  78.86459 3.030404e-16 2.266237e-13

Question : I actually have 59 unique samples of which some have only expression for time 0 , some for 0 and 1 and some for 0 1 2 and 3 ( so in total they are 141 ). So far I'm just selecting those unique samples that have the treatment time expression for all 4 times. So for instance between this 59 samples only 8 have the reported expression in treatment time form 0 to 3 ( 8 unique samples so 32 samples in total [ 8 x 4] ). But I was asking myself if is needed to be so strict for what I want to reach. Would it be ok to use all the samples and just fit them as I did above? I would have time treatment with different cardinality because maybe I have more samples for one specific time. Would this be wrong?

R GEO dce • 1.3k views

ADD COMMENT • link 3.1 years ago by ShowPow ▴ 20

score 2 · Answer 1 · 2022-08-01

2

Entering edit mode

3.1 years ago

i.sudbery 22k

Having different numbers of samples in your different time points is not a problem.

I am more interested in the purpose of comparing each timepoint to normal? If you are interested in the effects of treatment on the tumour cells, why include the normal patients?

ADD COMMENT • link 3.1 years ago by i.sudbery 22k

0

Entering edit mode

I want to study how much the platelets (this are all expression obtained by platelets ) keep information about the tumor develop. Is mine a wrong approach? If yes what would be your suggestion? Thanks for the comment btw.

ADD REPLY • link 3.1 years ago by ShowPow ▴ 20

0

Entering edit mode

How are you planning to interpret the results? What results would look like the platelets are retaining information, and what results would look like they weren't?

ADD REPLY • link 3.1 years ago by i.sudbery 22k

0

Entering edit mode

Lower density in their respective differential co-expression network. If the network is more disconnected and with less degree of connection then I assume that platelet can give an overview over the success or partial success of a treatment. If this does not happen it means that platelet could not be used for this purpose.

ADD REPLY • link 3.1 years ago by ShowPow ▴ 20

1

Entering edit mode

I'm not particularly familiar with the details of differential co-expression network analysis, but I wasn't aware that it required differential gene expression analysis?

Either way, your design above will identify a list of genes that differ between time points or differ between normal and cancer. If thats what you need, then the design is fine. Usually I would encode something like this with separate formula terms for cancer vs normal and time point and seperately identify genes that differ between cancer and normal or between treatment timepoints, but that really is research hypothesis dependent.

ADD REPLY • link 3.1 years ago by i.sudbery 22k

0

Entering edit mode

Thank you for the answer , pretty clear. Is this possible using edgeR ( I refer to the method you would use mentioned at the end)? If yes I will just study better the guide offered in PDF format.

ADD REPLY • link 3.1 years ago by ShowPow ▴ 20