Entering edit mode
7.5 years ago
thecspence
•
0
After converting a vcf file to their respective ped and map files using vcftools I consistently run into the error when filtering it through plink that genotype data is missing for every individual. I have tried multiple other conversion tools, but the end result is always the same. Why is this happening and how do I prevent it?
It depends on what analysis you are doing. VCF file format allows partially or completely missing information in the genotype field, which is very important for multisample vcf files. Say with WES trio you might be missing some coverage on some of your samples on some exons that are kind of important and part of a gene, but not super important. You still need to write it in the vcf format and here genotypes like ./. and ./0 come in handy. Plink data formats and tools were designed originally with genotyping-like data in mind where you know alleles for all samples. So some of the formats, especially more modern ones like binary plink bfiles support missing data and triallelic+ variants, but in general, this can cause a trouble since some subcommands within plink will not deal with these properly. There are three ways of dealing with it if you have to use plink: 1. remove variants from vcf with dot present in the genotype field and then covert to plink 2. when you run plink and it ended up with error, most of the time there will be file .missnp generated with all troubled markers, remove them using plink --exclude option and rerun your analysis using this filtered plink file.
If I use vcftools to filter for only biallelic sites, would that also deal with the problem?