We have recently performed targeted single cell DNA sequencing on a 50:50 mixture of two tumour samples from two different patients. I end up with a multi-sample vcf and I'm now trying to "demultiplex" these cells into two groups representing the two patients.
My first thought was to use something like a simple pca or identity by state analysis on the genotypes (as implemented in snpRelate). After running this I do end up with two approximately equally sized clusters of cells which could represent the different patients. I'm wondering however if this is a viable approach or whether I'm overlooking some potential systemic bias resulting in this clustering? Thanks!
In the context of the tumour evolutionary model and tumour clonality, this approach seems plausible for mutation data, as many mutations will be clonal (present in all cells of the patient); however, it would only help if these clonal mutations in each patient were different. Sub-clonal mutations, which should be present in less cells, will have less influence on the clustering.