Hello,
We have RNAseq data on a sample from a few years ago that looked odd (possibly mislabeled), so we re-sequenced a bunch of candidates that could possibly have been that sample.
Is there any way to find out which of the re-sequenced samples the original sample is closest to? I could run PCA but these re-sequenced candidates and the original would show some sort of batch effect and that might interfere with how the samples stratify on a PCA plot. Is there any way I could account for that?
I'd appreciate any pointers. Thank you!
Maybe try correlation (Pearson) analysis between your old mislabeled sample and the new ones?
Thank you, Benn! I'll run a Pearson correlation first thing.
You can call SNP's on all samples and compare them. Ideally you would have an independent set (plates) that is just run to check SNP's but this would be a good substitute.
How would I compare SNPs? Compare pairwise conservation stats?