Differential analysis using cell-line data with replicate information
1
0
Entering edit mode
5.7 years ago
newbie ▴ 130

I have a total of 8 samples, 4 controls and 4 Foxcut gene over expressed samples.

The column data for all the 8 samples look like below with replicate and cell-line information:

Samples             TYPE                 Replicate   Cell-lines
Cell1_HA1         Control                  1             1
Cell1_HA2         Control                  2             1
Cell1_foxcut11  FOXCUT_OverExpression      1             1
Cell1_foxcut12  FOXCUT_OverExpression      2             1
Cell2_HA1         Control                  3             2
Cell2_HA2         Control                  4             2
Cell2_foxcut11  FOXCUT_OverExpression      3             2
Cell2_foxcut12  FOXCUT_OverExpression      4             2

I have counts data for all the 8 samples after star alignment. I'm using edgeR package for differential analysis. This is the first time I'm doing differential analysis with cell-line data with replicate information. I'm not aware about how to create design matrix and contrast.matrix for differential analysis between different samples.

I wanted to compare the below samples and do differential analysis:

Cell1_foxcut samples vs Cell1_HA samples
Cell2_foxcut samples vs Cell2_HA samples

Can anyone please help me how to group the samples and how to create design matrix and how to mention coef for differential analysis between different samples.

RNA-Seq r edger celllines differentialanalysis • 2.3k views
ADD COMMENT
0
Entering edit mode

How does a PCA plot of the whole dataset look like? If your cell lines are considerably different (which is very likely), you are better off performing a separate analysis for each cell line.

ADD REPLY
0
Entering edit mode

Please check this:

### Differential Analysis
library(edgeR)
group <- factor(paste0(coldata$Type))
y <- DGEList(data,group = group)
y$samples 

## Filtering (Based on smallest number of samples among two groups do the filtering)
keep <- rowSums(cpm(y) > 0.5) >= 1
table(keep)
summary(keep)

y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y,method = "TMM") ##Normalization

# MDS Plot
#The RNA samples can be clustered in two dimensions using multi-dimensional scaling (MDS) plots
pch <- c(0,1,2,15,16,17)
colors <- rep(c("darkgreen", "red", "blue"), 2)
plotMDS(y, col=colors[group], pch=pch[group], labels = colnames(y))
legend("bottomleft", legend=levels(group), pch=pch, col=colors, ncol=2)

The plot looks like this MDS plot

ADD REPLY
0
Entering edit mode

Your samples cluster mainly based on the cell line and not the treatment which is what I would expect for cell lines. Therefore, only compare within the same cell line based on the different treatment but not across cell lines as the confounding effect is probably (most likely) too dominant.

ADD REPLY
0
Entering edit mode

Yes, differential analysis needs to be done within the same cell-line. I edited my question. Could you please tell me how to give the syntax for group, design matrix and contrasts using edgeR? thanq

ADD REPLY
0
Entering edit mode

@ATpoint Hi, could you please tell me how to create design matrix for the differential analysis within the same cell-line

Do you think the below code is right?

library(edgeR)
group <- factor(paste0(coldata$TYPE))
y <- DGEList(data,group = group)
y$samples 

## Filtering 
keep <- rowSums(cpm(y) > 0.5) >= 1

y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y,method = "TMM") ##Normalization

## Create design matrix
design2 <- model.matrix(~ 0 + group + coldata$Replicate + coldata$Cell-lines)
ADD REPLY
0
Entering edit mode

I would simply make two separate experiments (y) and then use ~ TYPE. As the cell lines are probably quite different from each other, having them in one y might screw up the normalization factors.

ADD REPLY
0
Entering edit mode

May I know how this can be done please. I haven't seen anywhere about this type of analysis, so I'm not at all aware about how to do this.

ADD REPLY
0
Entering edit mode

Instead of importing all 8 samples into R, simply import the first 4 as one object and the second 4 as a second object. Can you show the code that imported the data into R?

ADD REPLY
0
Entering edit mode

Instead of showing in table, I'm showing the counts data for all samples with some genes.

data <- structure(list(Cell1_foxcut12 = c(4L, 8L, 3L, 4L, 7318L, 25317L, 
41L, 0L, 0L, 0L), Cell2_foxcut11 = c(9L, 11L, 2L, 6L, 4959L, 
2621L, 38L, 0L, 0L, 0L), Cell1_foxcut11 = c(0L, 3L, 2L, 0L, 4163L, 
23581L, 33L, 0L, 0L, 0L), Cell2_foxcut12 = c(16L, 13L, 5L, 4L, 
6554L, 3220L, 68L, 12L, 0L, 0L), Cell2_HA1 = c(4L, 17L, 2L, 0L, 
3981L, 2395L, 44L, 0L, 0L, 0L), Cell1_HA1 = c(0L, 9L, 3L, 0L, 
5234L, 25810L, 18L, 0L, 0L, 0L), Cell2_HA2 = c(7L, 11L, 0L, 2L, 
3803L, 2695L, 30L, 0L, 0L, 0L), Cell1_HA2 = c(9L, 9L, 2L, 7L, 
6524L, 25617L, 40L, 0L, 0L, 0L)), row.names = c("5S_rRNA", "7SK", 
"A1BG", "A1BG-AS1", "A1CF", "A2M", "A2M-AS1", "A2ML1", "A2ML1-AS1", 
"A2ML1-AS2"), class = "data.frame")

colnames(data) %in% coldata$Samples
coldata <- coldata[match(colnames(data), coldata$Samples),]
table(coldata$Type)

library(edgeR)
group <- factor(paste0(coldata$TYPE))
y <- DGEList(data,group = group)
y$samples 

## Filtering 
keep <- rowSums(cpm(y) > 0.5) >= 1

y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y,method = "TMM") ##Normalization

## Create design matrix
design2 <- model.matrix(~ 0 + group + coldata$Replicate + coldata$Cell-lines)

This is the code I used.

ADD REPLY
0
Entering edit mode

@ATpoint Could you please tell me what is wrong in my above code

ADD REPLY
0
Entering edit mode
5.7 years ago

Cell1_foxcut11 vs Cell1_HA1

You want to compare one sample to one sample?

Without replicates, you don't really need and can't use sophisticated software. The fancy software takes into account the variance between replicates, but you don't have any.

I don't think you'll be able to do much but look at the very largest differences between your two samples and say "Yeah, those are probably real".

ADD COMMENT
0
Entering edit mode

Sorry, my mistake. It should be something like this

Cell1_foxcut samples vs Cell1_HA samples
Cell2_foxcut samples vs Cell2_HA samples
ADD REPLY
0
Entering edit mode

You need to do what everyone else does their first time. Work through tutorial examples.

ADD REPLY

Login before adding your answer.

Traffic: 1834 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6