I have two genomic samples from the same source and I want to determine if slight variations in extraction treatment done to them had a significant effect.
Assuming I can align these sequences to a reference - what is the accepted method for judging whether samples A and B are more or less similar to samples A and C or B and C?
I could count the number of identically mapped positions but I would like to temper that with some measure of sample size and diversity.
Are all locations of equal weight when you measure "significant"? Depending on the bias you expect in your extraction or the downstream analysis, you may want to weight your loci depending on whether they fall in (for example) a known exon, a UTR, an intron, evolutionarily conserved sequence, a CpG island.
From a machine learning perspective, you may look in to the sequence composition or other nucleotide sequence derived features to perform the comparison (A+T, GC content, di-nucleotide frequency, tri-nucleotide frequency etc.)
You are almost certainly going to want biological and/or technical replicates of each of the two extraction methods. Even using the same extraction methods it's fairly typical to see some variation in your results, and you'll need to account for that.