I read the paper by Quake lab about using single cell RNA-seq to find new cell lineage marker in lung development. Their method is to use PCA (principle component analysis) to select genes to do unsupervised hierarchical clustering (HC). Here they described that "Genes with highest loadings in the first four components were analysed by unsupervised hierarchical clustering as well as PCA". I think the loading has an equivalent concept to Eigenvector. Hence, to do the analysis, they generated mx4 matrix (m = gene number,loading matrix?) so, my problem is: how do we choose those genes with highest loadings?
- Select those genes which has the largest sum of weights (I mean, sum of each row, thus mx1, then order them); or
- Select those genes which has one of largest weight in either of four columns
The solution is (1) or (2)? or I mis-understand the concept of PCA?
A similar post here but I think they described a nx1 loading matrix.
BTW, is there another way to infer the new cell lineage or classify groups of cells? Is there a evaluation report on those methods? TIA
I'm having the same questions and was wondering if you have made any progress on this?
No. Someone suggested that the first way is OK (add the weights together and then ordering). But I did not find this explanation from the textbook. I wrote a letter to the authors and asked the source code, but no response. Anyway, if you find the answer,do let me know.