Best way to convert VCF to PLINK file format and merge chromosomes?
1
3
Entering edit mode
9.2 years ago
nchuang ▴ 260

I am trying to convert the 1000G genotypes into plink format so I can try to run a PCA.

I used Plink 1.9 to recode all the vcf.gz to binary bed files. Now I am using --merge-list to merge each chromosome together into one file. I am curious if I should be worried about the warnings about multiple positions for variants. If that is an issue why was it not mentioned in the vcf to plink conversion, and how does a rsID have more than one position unless they meant more than one base pair like it was a structural variant? The multiple chromosomes seen I am not so sure what that means unless it is an error?

Also I assume I also merge my case population with the 1kG dataset then prune them by LD. After that I can use plink to make a MDS plot or use GCTA?

Just saw this: https://groups.google.com/forum/#!topic/plink2-users/RNztDLWCfB8

I guess those SNPs in 1kG are multi-allelics?

plink • 12k views
ADD COMMENT
0
Entering edit mode

actually just going back and I saw when I did the vcf to plink conversion it already filters for only biallelic loci so I don't understand how I would get multiallelic sites...

ADD REPLY
1
Entering edit mode

For multiallelic sites, Plink 1.9 defaults to keeping only the reference allele and the most common alternate allele; any call involving a lower-frequency alt allele is treated as missing data. If you want such sites to be entirely skipped, you need to add the --biallelic-only flag.

ADD REPLY
1
Entering edit mode

I see, so if I understand you correctly, even though it says filtering biallelic it is really just assigning missing data to the third allele? If I use the biallelic-only flag it will just skip that SNV entirely?

I found through browsing around google and your threads the genetics for fun blog which has exactly what I needed. It was not easy to find despite the obvious title, so I'll post it here for future reference:

http://apol1.blogspot.com/2014/11/best-practice-for-converting-vcf-files.html

ADD REPLY
1
Entering edit mode

Also what are your thoughts on using GATKs VariantsToBinaryPed for vcf to plink?

ADD REPLY
1
Entering edit mode
6.2 years ago

Follow my tutorial here for best practices on doing this: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6