Hi, ALL, I want to do unsupervised clustering using segmented copy number variation data (like those derived from SNP array), and then visualize it. The results will look like the following figure (Figure 1A). Samples are clustered based on their CNV.
Clustering of copy number (Figure 1A)
I know how to draw a heatmap with clustering using data in matrix in R software. However, the data structure of the segmented copy number is quite different. I only know IGV tools can visualize this kind of data. But IGV doesn't provide options to do the clustering. Can anybody give me some instructions to do this? Any help will be greatly appreciated.
Isn't that described in the method section of the paper (if you gave the link to the paper, we could read it) ? The key is to get a vector representation of the samples that captures the relevant information. From the figure, each sample appears to be represented by a vector in which each element corresponds to a section of chromosome and the values are copy gain/loss of each chromosomal section.
Thanks for your answer. This is the original paper Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma. The authors did mention how they performed the analysis in the supplement data (page 11 of supplementary material). However, it was very simple and did not describe clearly how to do the clustering using copy number data. Thanks again.
As I read it, they represented each tumor with a vector of regions identified by the GISTIC2.0 software as having copy number variations and each value in the vector is the log2 of the copy number of the corresponding region. Then they did clustering with:
Could you please elaborate a little on "The key is to get a vector representation of the samples" and "they represented each tumor with a vector"? Thanks.
I wish to perform a clustering analysis on the long-insert whole genome sequencing assay CNV data based on the Multiple Myeloma database. As a part of their download, I have only the .seg file made available. I believe the GISTIC2.0 software requires a markers.file.
1) is GISTIC2.0 tool appropriate to use for whole genome sequencing assay CNV analysis? if not, what tools could I use? 2) How to account for the samples that do not have a copy gain, copy loss or is copy neutral?
I wish to perform a clustering analysis on the long-insert whole genome sequencing assay CNV data based on the Multiple Myeloma database. As a part of their download, I have only the .seg file made available. I believe the GISTIC2.0 software requires a markers.file.
1) is GISTIC2.0 tool appropriate to use for whole genome sequencing assay CNV analysis? if not, what tools could I use? 2) How to account for the samples that do not have a copy gain, copy loss or is copy neutral?
Please post this as a new question. Then come back and delete this post.