Entering edit mode
5.8 years ago
shawn
▴
20
Hi everyone,
I am learning do the gwas analysis. When I convert the genomic data "1001genomes_snp-short-indel_only_ACGTN.vcf.gz" download from here to plink ped format.
plink --vcf 1001genomes_snp-short-indel_only_ACGTN.vcf.gz--make-bed --out 1001genomes_snp-short-indel_only_ACGTN.vcf.gz
I find there are many 0 in the ped file like this:
88 88 0 0 0 -9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C C 0 0 T T 0 0 G G C C T T 0 0 0 0 T T G G 0 0 T T T T A A A A T T 0 0 T T 0 0 G G C C A A T T A A C C C C C C A A T T C C T T G G G G T T 0 0 C C G G G G T T T T T T A A T T C C G G G G G G C C C C G G G G G G G G C C C C G G C C T T T T G G A A C C T T A A G G 0 0 G G A A T T A A 0 0 0 0 C C C C T T G G G G G G A A T T 0 0 0 0 A A G G T T T T G G 0 0 C C T T C C 0 0 A A C C C C G G G G A A G G C C C C G G C C G G C C C C C C G G G G G G G G A A 0 0 C C C C A A C C C C C C G G C C C C C C C C C C C C C C G G T T C C C C C C C C A A 0 0 A A T T 0 0 T T T T T T A A G G T T G G G G T T C C G G G G C C G G C C 0 0 C C C C T T T T T T A A T T T T G G A A G G C C C C G G 0 0 G G C C G G T T T T C C 0 0 G G A A 0 0 C C G G C C T T 0 0 T T C C A A G G 0 0 C C A A A A G G C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
And when I do the quality control
plink --bfile 1001genomes_snp-short-indel_only_ACGTN --maf 0.05 --geno 0.02 --mind 0.02 --hwe 1e-6 --make-bed --out snp
it showed "Error: All people removed due to missing genotype data (--mind)". Does anyone know the reason? Do I choose the wrong dataset or I made some mistake? Thanks a lot.
Please use the formatting bar (
10101
) to highlight code and data examples.I agree with ATpoint: this would make your example more readable. Also, it would be helpful if you posted the corresponding line of the vcf so that we can see if there was a problem in the converison.
Hi Fabio, I have adjusted the format. Thanks for your suggestion. Do you know the reason for my problem? Thank you very much.
Shawn
The reason is that you have too many missing genotypes (presumably all the zeros). How many missing data are there in the vcf? How much missing data is tolerated with your plink command? It can be a problem in conversion, or maybe the vcf had a lot of missing data.