RNA Seq Analysis in R
2
0
Entering edit mode
4.5 years ago
anasjamshed ▴ 140

I choose microarray data GSE75693 of 30 patients with stable kidney transplantation and 15 with BKVN to identify differentially expressed genes (DEGs). I performed this in GEO2R and find R script there and Runs R script Successfully on R studio as well. The R script is :

#   Differential expression analysis with limma
library(Biobase)
library(GEOquery)
library(limma)

# load series and platform data from GEO

gset <- getGEO("GSE75693", GSEMatrix =TRUE, AnnotGPL=TRUE)
if (length(gset) > 1) idx <- grep("GPL570", attr(gset, "names")) else idx <- 1
gset <- gset[[idx]]

# make proper column names to match toptable 
fvarLabels(gset) <- make.names(fvarLabels(gset))

# group names for all samples
gsms <- paste0("000000000000000000000000000000XXXXXXXXXXXXXXX11111",
        "1111111111XXXXXXXXXXXXXXXXXXX")
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }

# eliminate samples marked as "X"
sel <- which(sml != "X")
sml <- sml[sel]
gset <- gset[ ,sel]

# log2 transform
exprs(gset) <- log2(exprs(gset))

# set up the data and proceed with analysis
sml <- paste("G", sml, sep="")    # set group names
fl <- as.factor(sml)
gset$description <- fl
design <- model.matrix(~ description + 0, gset)
colnames(design) <- levels(fl)
fit <- lmFit(gset, design)
cont.matrix <- makeContrasts(G1-G0, levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2, 0.01)
tT <- topTable(fit2, adjust="fdr", sort.by="B", number=1250)

tT <- subset(tT, select=c("ID","adj.P.Val","P.Value","t","B","logFC","Gene.symbol","Gene.title"))
write.table(tT, file=stdout(), row.names=F, sep="\t")

But I want to select DEGs based on the threshold P < 0.01 and fold change >2.0. How can I do this through R script?

RNA-Seq R MicroArray DEGs • 2.8k views
ADD COMMENT
0
Entering edit mode

Hello anasjamshed1994!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/14070/deg-analysis-in-r-through-bioconductor

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Your cross-posting on a different site, especially after someone has already started helping you here, is disrespectful and unprofessional.

ADD REPLY
0
Entering edit mode

and Kevin is my brother and I don't want to disrespect him at any cost

ADD REPLY
0
Entering edit mode

brother, what can I do if I don't find a solution from the last 10 days? I already contacted authors

ADD REPLY
0
Entering edit mode

In that case, there is nothing you can do but wait. You could send the authors a second email, but beyond that, there's nothing more you can do.

ADD REPLY
3
Entering edit mode
4.1 years ago
DareDevil ★ 4.3k

Hi, First of all this is microarray data not RNAseq data. As per the code the data it takes data from here which is already normalized.

Then I don't know why do they perform normalization on the same data which is already normalized.

# log2 transform
exprs(gset) <- log2(exprs(gset))

It is always better to download the raw data and preprocess it using affy and limma packages.

A sample of the analyis can be:

#set the working directory where you have the cel files are stored
setwd(path/to/directory/)
#load required packges
library(affy)
library(limma)
library(oligo)
library(pd.hg.u133.plus.2)

#list the cel files
list.celfiles()
names = list.celfiles()

#create array with sample names
array = read.celfiles(names)

#perform rma algorithm to normalize the data
eset = rma(array)
write.exprs(eset, file = "data_normalized.txt") #this will be your normalized data by rma

#Load the target files which the information about the sample and their corresponding group
targets<-read.delim(file="targets.txt", header=T)
#create design and fit the design
design <- model.matrix(~0+ conditions)
fit <- lmFit(eset, design)
#create contrast
contrast.matrix <- makeContrasts(group1=condition1-control, group2=condition2-control, levels = design)

#fit the contrast
fit.cont <- contrasts.fit(fit, contrast.matrix)
fit.cont<- eBayes(fit.cont)

results<-decideTests(fit.cont,adjust.method="fdr",p=0.05)
summary(results)
ADD COMMENT
0
Entering edit mode

how can we make graphs(heatmaps,histogram, bar ) after this analysis?

ADD REPLY
2
Entering edit mode

you need the normalized data which you will get from the command

write.exprs(eset, file = "data_normalized.txt")

you can use pheatmap for making the heatmap from the matrix, which is quite simple and easy

boxplot and histogram are inbuilt in R

boxplot(expression_matrix, main="Signal Normalized",
        xlab="Sample",
        ylab="Log2 Intensity",cex=0.8, las=2)

for volcanoplot you can use EnhancedVolcano which is developed by Kevin

ADD REPLY
0
Entering edit mode

I'd recommend ComplexHeatmap over pheatmap - it is a LOT better in terms of how the entire concept is structured.

ADD REPLY
0
Entering edit mode

what is expression matrix?

ADD REPLY
2
Entering edit mode

You can load the expression_matrix from the data_normalized.txt.

expression_matrix = read.delim("data_normalized.txt", sep="\t", header=TRUE)
ADD REPLY
0
Entering edit mode

That would be your dataset. You should know what it is if you're working on Differential Expression.

ADD REPLY
0
Entering edit mode

Use ggplot2, ComplexHeatmap, etc. Invest some effort in learning these tools, they'll be really useful.

ADD REPLY
1
Entering edit mode
4.5 years ago

Hi Anas, please take a look at the output of ?topTable and you will be able to figure this out.

Kevin

ADD COMMENT
0
Entering edit mode

no brother I can't find any method to select threshold P < 0.01 and fold change >2.0. plz help me

ADD REPLY
0
Entering edit mode

I am pretty sure that the parameters that you need are p.value and lfc.

If you cannot filter directly with toptable(), then just do:

subset(tT, adj.P.Val < 0.05 & abs(logFC) > 2)
ADD REPLY
0
Entering edit mode

after runnning this subset(tT, adj.P.Val < 0.05 & abs(logFC) > 2) no genes are found . plz help me

ADD REPLY
1
Entering edit mode

Then nothing is statistically significantly, and there is nothing that I can do to help in relation to this. Try to reduce these thresholds (0.05 and absolute 2)

ADD REPLY
0
Entering edit mode

DEGs were selected based on the threshold P < 0.01 and fold change >2.0. This is written in the paper so I want to select like this and output should show 504 genes

ADD REPLY
0
Entering edit mode

Email the authors or re-read the paper and make sure you're classifying samples accurately.

ADD REPLY
3
Entering edit mode

I already did all the things that's why I am asking here AND Ram if you don't want to help others then plz not criticize other

ADD REPLY

Login before adding your answer.

Traffic: 2794 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6