I would like to run a PCA on SNPs identified within mtDNA for 7 fish (non-model). The SNPs were generated via the Illumina platform, resulting in separate .vcf files for each fish. I believe that I need to merge the .vcf files, but have run into a few problems, and have read that this can create bias. I am hoping someone can help me out by either recommending a way to run a PCA with separate files or a way to merge them.
Notes on .vcf files: My .vcf files are separate files for individual fish, identifiying SNPs in the mtDNA when each was aligned to the mitochondrial genomes of two different species. So, for 7 fish, I have 14 total files with the number of SNPs ranging from 0 to over 1k. These files do not have chromosome numbers, but instead list the reference sequence code.
Can someone please recommend an approach? I have browsed the forums looking for an answer, but quite often people doing this are not working with mtDNA and are working with humans. Any help would be appreciated.
Thanks!
Hi both, I'm running on the same problem here and I have a doubt: when converting the vcf to a 0 and 1 matrix... what do you do with heterozygous calls (0/1) in the VCF? Do you assign them as 0 or 1? I guess they are probably heteroplasmies, right? Thanks in advanced!