Hi everybody, I'm currently working on Rna-seq data and wanted to study how the gene expression changes during treatment time . This is the first time I do it, so I wanted to know if my steps are correct and ask a final question. I took 8 samples with Tumor and other samples of healthy people ( AC = 100 ) . Each of the Tumor samples is present in a follow up of 4 times. So in total I have 32 samples ( Tumor = 8 x 4 = 32 ) . What I've done is the following :
#Factor with 5 levels AC,GBM0,GBM1,GBM2,GBM3
group <- factor(samples$class)
# fit data into a DGEList data class
y <- DGEList(counts = counts, group = group)
# normalize counts using trimmed mean of M-values method
y <- calcNormFactors(y,method = "TMM")
# build the design matrix based on the group
design <- model.matrix(~ 0 + group)
# re-assign column names
colnames(design) <- levels(group)
# compute common, trend, tagwise dispersion
y <- estimateDisp(y,design = design ,robust = TRUE)
# fit the negative binomial GLM for each tag
fit <- glmFit(y, design=design)
# build the contrast
contrast <- makeContrasts(
GBM0VsAC = GBM0-AC,
GBM1VsAC = GBM1-AC,
GBM2VsAC = GBM2-AC,
GBM3VsAC = GBM3-AC,
levels=colnames(design))
# perform likelihood ratio test for differential expression
lrt <- glmLRT(fit, contrast = contrast)
lrt.top <- topTags(lrt, n=nrow(lrt$table))$table
head(lrt.top)
GeneENSEMBL logFC.GBM0VsAC logFC.GBM1VsAC logFC.GBM2VsAC logFC.GBM3VsAC logCPM LR PValue FDR
ENSG00000103175 5.958471 2.4203915 1.6676587 1.9970883 3.357118 208.36483 5.971650e-44 2.679479e-40
ENSG00000100060 1.054112 1.0377118 0.6542040 0.8685774 7.277566 94.56561 1.409715e-19 3.162696e-16
ENSG00000114439 1.435828 0.4227201 -0.1761401 -0.0960050 6.858234 88.95804 2.191882e-18 3.278325e-15
ENSG00000213672 1.785063 1.5803324 1.2577956 1.4061236 6.391914 84.90323 1.590392e-17 1.784022e-14
ENSG00000167100 2.102274 1.9095356 1.2437504 1.4472331 7.075963 80.22773 1.558682e-16 1.398761e-13
ENSG00000145425 -2.324358 -2.3476719 -1.9487018 -1.6620392 9.519217 78.86459 3.030404e-16 2.266237e-13
Question : I actually have 59 unique samples of which some have only expression for time 0 , some for 0 and 1 and some for 0 1 2 and 3 ( so in total they are 141 ). So far I'm just selecting those unique samples that have the treatment time expression for all 4 times. So for instance between this 59 samples only 8 have the reported expression in treatment time form 0 to 3 ( 8 unique samples so 32 samples in total [ 8 x 4] ). But I was asking myself if is needed to be so strict for what I want to reach. Would it be ok to use all the samples and just fit them as I did above? I would have time treatment with different cardinality because maybe I have more samples for one specific time. Would this be wrong?
I want to study how much the platelets (this are all expression obtained by platelets ) keep information about the tumor develop. Is mine a wrong approach? If yes what would be your suggestion? Thanks for the comment btw.
How are you planning to interpret the results? What results would look like the platelets are retaining information, and what results would look like they weren't?
Lower density in their respective differential co-expression network. If the network is more disconnected and with less degree of connection then I assume that platelet can give an overview over the success or partial success of a treatment. If this does not happen it means that platelet could not be used for this purpose.
I'm not particularly familiar with the details of differential co-expression network analysis, but I wasn't aware that it required differential gene expression analysis?
Either way, your design above will identify a list of genes that differ between time points or differ between normal and cancer. If thats what you need, then the design is fine. Usually I would encode something like this with separate formula terms for cancer vs normal and time point and seperately identify genes that differ between cancer and normal or between treatment timepoints, but that really is research hypothesis dependent.
Thank you for the answer , pretty clear. Is this possible using edgeR ( I refer to the method you would use mentioned at the end)? If yes I will just study better the guide offered in PDF format.