Question

Unsupervised Analysis Methods For Dna Sequencing Data

2

Entering edit mode

11.6 years ago

Rainer ▴ 130

Dear all,

we have received several exome sequences from our collaborators for patients with a polygenic disease - control samples for supervised analysis are supposed to be provided later, but the collaborators have asked us whether we can already provide some unsupervised analyses of the patient samples in the meantime. The only idea I have so far for unsupervised analysis is to cluster samples using pre-filtered variants for different quality cut-offs, to see whether a robust hierarchical or partition-based sample clustering can be obtained, or to look for cluster patterns at the level of pathways. However, I expect these clustering patterns to be inferior to standard microarray transcriptomics clustering and wondered whether there might be other unsupervised analyses of sequencing data (beyond simple quality checks) that can provide more insightful results than just a sample clustering.

Any suggestions would be greatly appreciated.

clustering sequencing dna analysis statistics • 2.5k views

ADD COMMENT • link updated 11.6 years ago by Michael 55k • written 11.6 years ago by Rainer ▴ 130

0

Entering edit mode

Perhaps they are asking for some summary statistics on important features for the covariates and the sequencing results. Of course clustering is one of them. What about metrics such as coverage, and NGS statistics, sorry it's just too broad of a request to give a precise answer, but I hope this can help.

ADD REPLY • link 11.6 years ago by Raygozak ★ 1.4k

score 0 · Answer 1 · 2013-06-11

0

Entering edit mode

11.6 years ago

Michael 55k

I have seen PCA applied to SNPs and GWAS like in this paper Paschou et al., there is also the R package SNPrelate. I am not totally sure that this is what you need, but at least it can provide a starting point for further searches.

ADD COMMENT • link 11.6 years ago by Michael 55k