Entering edit mode
7.1 years ago
Leite
★
1.3k
Hi everyone, this is an R code I've been studying based on this tutorial.
This tutorial of mine has helped a lot, but I want to learn more about expression analysis using R.
My doubts are regarding these codes:
- nrow(topTable(fit.eb, coef=1, number=10000, lfc=2))
What the lfc=2 means and how to use it right?
- contrast.matrix <- makeContrasts(disease-control, levels=design)
In this part, I've seen it just like that,disease-control, , but also so: disease-control, control-disease
What would be the difference between the two ways of using it?
- How do I adjust the p-value and FDR in this code? Can I do this manually in the end?
- Some genes get the symbol "NA", how to solve this problem?
- How to make this code better? I appreciate all the tips
Best regards, Leite
#Install the core bioconductor packages, if not already installed
>source("http://bioconductor.org/biocLite.R")
>
>biocLite()
#Install additional bioconductor libraries, if not already installed
>biocLite("GEOquery")
>biocLite("affy")
>biocLite("gcrma")
>biocLite("affyPLM")
>biocLite("RColorBrewer")
>biocLite("limma")
>biocLite("hgu133plus2.db")
>biocLite("annotate")
>biocLite("simpleaffy")
#Load the necessary libraries
>library(GEOquery)
>library(affy)
>library(gcrma)
>library(affyPLM)
>library(RColorBrewer)
>library(hgu133plus2.db)
>library(annotate)
>library(simpleaffy)
#Downloading the file
>getGEOSuppFiles("**MyData**")
#Before starting to work, you have to unzip the files.
>untar("**MyData**/**MyData**_RAW.tar", exdir="data")
>cels <- list.files("data/", pattern = "[gz]")
>sapply(paste("data", cels, sep="/"), gunzip)
>cels
#Loading Files
#Phenodata must have names have to be the same as file names
>celfiles <- read.affy(covdesc="phenodata.txt", path="data")
#Normalising the data
>celfiles.rma <- rma(celfiles)
#Checking the quality
#Load colour libraries
>library(RColorBrewer)
#Set colour palette
>cols <- brewer.pal(8, "Set1")
#Plot a boxplot of unnormalised intensity values
>boxplot(celfiles, col=cols)
#Now a normalised boxplot
>boxplot(celfiles.rma, col=cols)
#Plot a density vs log intensity histogram for the unnormalised data
>hist(celfiles, col=cols)
#Plot a density vs log intensity histogram for the normalised data
>hist(celfiles.rma, col=cols)
#The first step of the analysis is to filter non-informative data
>celfiles.filtered <- nsFilter(celfiles.rma, require.entrez=FALSE, remove.dupEntrez=FALSE)
#Find differentially expressed probes
>samples <- celfiles.rma$Target
#Convert into factors
>samples <- as.factor(samples)
# Check factors have been assigned
>samples
#Set up the experimental design
>design <- model.matrix(~0 + samples)
>colnames(design) <- c("disease", "control")
#Inspect the experiment design
>design
#At this point we have normalized filtered data and a description of the data and the samples and the experimental design.
#Fit the linear model to the filtered expression set
>fit <- lmFit(exprs(celfiles.filtered$eset), design)
#Set up a contrast matrix to compare disease v control
>contrast.matrix <- makeContrasts(**disease-control,** levels=design)
#Now, the contrast matrix is combined with the linear fit model by set of probes.
>fit.con <- contrasts.fit(fit, contrast.matrix)
>fit.eb <- eBayes(fit.con)
>nrow(topTable(fit.eb, coef=1, number=50000, **lfc=2**))
>probeset.list <- topTable(fit.eb, coef=1, number=10000, lfc=2)
#Annotating the results with associated gene symbols
>gene.symbols <- getSYMBOL(rownames(probeset.list), "hgu133plus2")
>results <- cbind(probeset.list, gene.symbols)
>head(results)
#To save results
> write.table(results, "results.txt", sep="\t", quote=FALSE)
Dear Kevin and Dear Devon, thank you very much for the information, this helped me a lot to understand the code.