This should be done on a 'case by case' basis, and summarising the expression should be justified.
limma provides easy functionality for this, as follows:
data_summarised <- limma::avereps(
data,
ID = gene)
Here, gene
is a vector of genes that correspond to the rownames of data
. This will summarise by mean, for each sample (column), across common values in the vector gene
.
This function was initially developed to summarise across replicate probes.
Reproducible example:
a <- matrix(rexp(200, rate=.1), ncol=20)
rownames(a) <- c(rep("a", 5), rep("g", 5))
limma::avereps(a, ID = rownames(a))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
a 4.086404 2.436660 8.220130 10.36580 11.46436 17.689969 14.42429 13.53203
g 8.271113 7.593843 9.395702 14.56003 13.07174 9.928446 18.92534 14.84183
[,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
a 10.93435 11.245397 14.44341 11.09513 15.632908 6.982594 5.212455 11.748156
g 19.31578 5.836391 11.37889 10.89469 4.175368 14.668283 10.516478 7.597563
[,17] [,18] [,19] [,20]
a 5.840073 2.898039 11.821731 12.896772
g 15.031233 6.441335 3.950631 3.877679
-------------------------------------------------
There is also another function, aggregate()
, which can summarise by any mathematical formula.
Kevin
thanks for the answer, If you refer some research paper which uses the same method. Combine prob expression into gene using mean or aggregate. I want to use for my academic.because of that I require some paper and is this method reliable.
Your reference is Gordon Smyth of limma. That supersedes everything else in bioinformatics :)
No, seriously, it is a standard procedure in microarray and gene expression analysis - look in the limma manual, at least for justification for the procedure in microarrays. If you look at published works, you may or may not see it mentioned in the methods, depending on whether or not the analyst writing the methods decided to mention it or not.
Of course, there are other ways of summarising data when transcript isoforms come into question. For that, I will refer you to the tximport, DESeq2, and EdgeR manuals.