Hello All,
I am kinda stuck at how to start the following kind of analysis.
I have identified a list of marker genes that are very specific to a particular case from RNA Seq. I would like to use these markers, and be able to classify a new microarray or RNA Seq expression values.
The idea would be to estimate the performance of the markers (in combinations) on a unrelated experiment involving similar cases (say microarray) and be able to classify the cases into appropriate groups. (May be I should also state, this is not pairwise, I have more than two cases.). Eventually identifying a potential combination of markers for each case.
Can some one of you suggest how to extrapolate on this thought, are there any supervised clustering algorithms (r packages) out there, that perform well on both microarrays and RNA Seq data?
Any code snippets for classification and visualization methos, references would be really appreciated. I would also like to hear some comments on the approach, I assume this ain't novel (have earlier come across some papers on cancer where markers were used in supervised classification of patients into cancer subtypes or treatment categories...).
Thanks!
Maybe start with PCA analysis with your marker profiles to see sample similarities? Some transformation is probably needed for combining microarray and RNA-Seq data.
Would it be important to have an associated profile of each marker with respect to all cases to perform this? I think I am missing out on understanding the implied message here? would you mind explaining a little more, please? Thanks in advance!
I read the description about your problems again and now I think maybe Linear Discriminant Analysis will be a good start. I personally have never done this before in practice but I know people use this for classification of cancer types. Please see this summary: www.stat.cmu.edu/~jiashun/Research/software/.../papers/dudoit.pdf They considered several classic classifiers but did some real data analysis with Linear Discriminant Analysis (LDA).