I post "Tutorial: Importance of Array Quality Control - arrayQualityMetrics (PART I), Importance of Array Quality Control - arrayQualityMetrics (PART I)". I have analyzed the Rembrandt Data (Brain tumor), to date, to find new insight of Glioma. This post is a part of my analysis related to Brain tumor. In conclusion, this result indicated that in clustering analysis Rembrandt data showed gender-specific gene expression pattern using 43 genes through unspecific gene filtering.
# Access linux server
# access the folder saved Rembrandt Glioma Data
$ cd Rembrandt_Glioma - 580 microarrays consist of astrocytoma, oligodendroglioma, normal, GBM, un-known
$ R # to access to R program
# Rembrandt Data Import into R
library (affy)
mydata <-ReadAffy()
# Multiple-array Normalization
mydata_rma<-rma(mydata)
# Array Quality Control through arrayQualityMetrics
library(arrayQualityMetrics)
# arrayQualityMetrics of mydata
arrayQualityMetrics(expressionset=mydata,outdir="Report_for_Rembrandt_RMA",force=TRUE,do.logtransform=TRUE)
# arrayQualityMetrics of mydata_rma
arrayQualityMetrics(expressionset=mydata_rma,outdir="Report_for_nRembrandt_RMA",force=TRUE)
write.table(mydata_rma,file="Rembrandt_RMA_QC.txt",sep="\t", quote=FALSE, row.names=TRUE, col.names=TRUE)
I removed outlier 31 of 580 samples through arrayQualityMetrics packages in Excel program. After edit of the file, which is saved as tab-deliminated file.
# Next, I filtered genes using genefiltering and saved at local computer.
mydata<-read.table(file="Rembrandt_RMA_QC.txt",sep="\t", row.names=1,header=T)
# Genefiltering using standard deviation
library(genefilter)
rsd <- rowSds(mydata) # Standard Deviation for row (features) more than 2
i<-rsd>=2
mydata_filtered<-mydata[i,] # 43 genes were selected
write.table(mydata_filtered,file="Rembrandt_RMA_QC_filtered.txt",sep="\t", quote=FALSE, row.names=TRUE, col.names=TRUE)
# Next, I performed the clustering tendency assessment of the above dataset (The clustering tendency assessment determines whether a given dataset contains meaningful clusters(1)).
install.packages ("clustertend")
library(clustertend)
set.seed(12345)
hopkins(mydata_filtered, n=nrow(mydata_filtered)-1,byrow=T, header=T) # mydata_filtered: variable is samples and object is genes
$H value : 0.2712307 (If the value of Hopkins statistic is close to zero, then we can reject the null hypothesis and conclude that the dataset D is significantly a clusterable data (1))
mydata_filtered_1<-t(mydata_filtered)
hopkins(mydata_filtered_1, n=nrow(mydata_filtered_1)-1,byrow=T, header=T) # mydata_filtered_1: variable is genes and object is samples
$H value : 0.288575 (If the value of Hopkins statistic is close to zero, then we can reject the null hypothesis and conclude that the dataset D is significantly a clusterable data (1))
Reference
(1) Accessing Cluster Tendency: A vital issue - Unsupervised Machine Learning (http://www.sthda.com)
Where's the bit about allosomes? And why hand-filter in Excel, this is quite the opposite of good practise?
Your comment is good. Removal of samples could be performed in R using target file. Actually, this method is easy and good. In case of small sample, I use Excel program. But, I will post the practise of R for sample removal than Excel.
I've put the code in code blocks, though this could use a bit more tidying up.
Should not this be be changed to the one I put. Since it is the array QC of the normalized data In the code
Corrected to
Thank you^_^!! Your comment is correct.