Entering edit mode
6.7 years ago
landscape95
▴
190
Hi al, after I get the METABRIC data set, I used background correction and normalization between arrays in limma package, I produced these figures for frequency and boxplot overview, am I doing well at the first glance for quality control of expression data? Is there any criterion/method for quality control of microarray expression data?
Your help is really appreciated! Thank you very much!
Here is my code, I plotted the first 150 samples:
MB_miRNA_processed <- backgroundCorrect(MB_miRNA_processed, method = "normexp", verbose = F)
MB_miRNA_processed <- normalizeBetweenArrays(MB_miRNA_processed, method="quantile")
hist(as.matrix(MB_miRNA_processed), main = "MB_miRNA_hist")
boxplot(MB_miRNA_processed[, 1:150], main="MB_miRNA_boxplot_150samples")
And this is the figure after I used boxplot with outline=F
boxplot(MB_miRNA_processed[, 1:150], main="MB_miRNA_boxplot_150samples", outline=F)
AFTER log2 transformation
MB_miRNA_processed <- backgroundCorrect(MB_miRNA_processed, method = "normexp", verbose = F)
MB_miRNA_processed <- normalizeBetweenArrays(log2(MB_miRNA_processed), method="quantile")
hist(as.matrix(MB_miRNA_processed), main = "MB_miRNA_hist")
boxplot(MB_miRNA_processed[, 1:150], main="MB_miRNA_boxplot_150samples", outline=F)
Hi landscape95,
As far i know the METABRIC is a microarray dataset not RNA-seq. The plots are not very clear but what you describe seems ok to me. Have you log2 transformed your data?
Yes, it is a microarray expression dataset, I haven't log2 transformed my data. What's your opinion?
I think Kevin is right - maybe sharing the commands you used would be useful. I would plot the log2 normalised expression in the box plot and maybe check how it looks before and after normalisation as well.
METABRIC, as in, the breast cancer cohort? Can you confirm the array type and also the commands that you have used?
It does and does not look normalised. There are tonnes of outliers in your box-and-whisker plot on the right, but I don't know if that's just because you are using a large point size. You can avoid plotting outliers by using
outline=FALSE
in the boxplotfunction()
. This would just help to improve visualisation for checking everything.Hi @Kevin, thank you! I have updated the information above
Going by your variable name, this is the METABRIC micro-RNA data, right? - it's not all mRNA species? The profile still looks odd. I don't know what Matina thinks.
Can you confirm the exact source (website)?
Hi Matina, thank you, I have updated the information above.