merging .vcf files for mitochondrial SNPs
1
2
Entering edit mode
7.2 years ago
jmw ▴ 20

I would like to run a PCA on SNPs identified within mtDNA for 7 fish (non-model). The SNPs were generated via the Illumina platform, resulting in separate .vcf files for each fish. I believe that I need to merge the .vcf files, but have run into a few problems, and have read that this can create bias. I am hoping someone can help me out by either recommending a way to run a PCA with separate files or a way to merge them.

Notes on .vcf files: My .vcf files are separate files for individual fish, identifiying SNPs in the mtDNA when each was aligned to the mitochondrial genomes of two different species. So, for 7 fish, I have 14 total files with the number of SNPs ranging from 0 to over 1k. These files do not have chromosome numbers, but instead list the reference sequence code.

Can someone please recommend an approach? I have browsed the forums looking for an answer, but quite often people doing this are not working with mtDNA and are working with humans. Any help would be appreciated.

Thanks!

next-gen sequence • 1.7k views
ADD COMMENT
1
Entering edit mode
7.2 years ago

Just convert your VCFs GT calls into a big table and run your PCA on that. You just need a 0 or a 1 for every sample and position in which any sample has a variant.

ADD COMMENT
0
Entering edit mode

Hi both, I'm running on the same problem here and I have a doubt: when converting the vcf to a 0 and 1 matrix... what do you do with heterozygous calls (0/1) in the VCF? Do you assign them as 0 or 1? I guess they are probably heteroplasmies, right? Thanks in advanced!

ADD REPLY

Login before adding your answer.

Traffic: 2365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6