Dear Biostars,
After searching the internet for quite a while, I have yet to find an easy solution for clustering of human genomic coordinates. This post asked the same question a couple of years ago, but there was no answer as to how one could simply cluster a bed file and be able to graph it in IGV (or any of your favorite genome graphers), and make it look like this figure.
Here's the breakdown of the problem at hand:
Data type: Human CNV data detected by both array and sequencing. Output from these analysis is a .bed file with the CNV positions, similar to this:
chr start end cnv_id sample_name sample_category
Clustering type: anything rolls, from k-means to unsupervised.
Question: Are there samples that preferentially cluster together because they share very similar CNV positions? Is this clustering of CNVs meaninful given the sample category (i.e. sick vs normal)?
I have read about CNVTools, which to my understanding needs probe intensities; I could never get iCluster to work; IGVTools doesn't have a clustering function; I'm unsure seqMINER or any other TSS/ChIP clustering algorithm will work with longer stretches of DNA sequence; and everything I have read about clustering methods in R revolves around single genes/values and not genomic coordinates.
It is why I appeal to the Biostars wisdom once more. I'd be grateful if someone could recommend a solution to this problem.
Thanks!
Sakti
What data are you trying to cluster? What is the assay and what is the question you want to answer? Are you dealing with copy number data, or something else? Sequence-based, or array?
Hi Sean, thanks for commenting. I have updated the post with the answers to your questions.