Entering edit mode
11 months ago
zau saa
▴
150
Hi!
I'm going to compute the distance between cells by their variants, but I have no clue.
For example, here are variants of mitochondria from two cells:
mttype_1 A73G,A210G,230delA
mttype_2 A73G,220delT
What's the distance between these two mttype?
Can you please explain the details of the problem behind this task and more information about the size of real data, how many samples and how many variants, how have the variants been detected and why are they in this format and not in VCF for example? There is no way to establish the correlation (or linkage) of these variants with only two samples, but then what do you really want distance or correlation? If you are doing some sort of phylogenetic approach, a distance matrix is not doing much good either. So what is the problem you are trying to solve? (obviously, the edit distance between the two sequences is 3 if there were no other differences)
Thanks for your reply! I identified about 30 mitochondrial DNA types from the scDNA-seq data of a sample before and I wonder if these types of mitochondrial DNA are related. (e.g. whether they have the same mutation direction)
Why not do a standard phylogenetic approach first? Add an outgroup and see how they form clades (or not).
Much appreciated!