Entering edit mode
5.8 years ago
shawn
▴
20
Hi everyone,
I am learning to do some GWAS analysis in Arabidopsis. I used some accessions from the 1135 list (1001 genomes project)for a GWAS experiment. I have some questions for the genotype data. I find there are several different genomes data including vcf format and hdf5 format. I selected the one named “1001_SNP_MATRIX.tar.gz”. So I want to ask if it is the right genotype data for GWAS analysis. And also I have a problem to convert the hdf5 format to plink format. Does anybody know how to figure it out. Look forward to your reply.
Thanks.
You need to figure out which dataset you need to work on. If it is VCF file, for example this file: https://1001genomes.org/data/GMI-MPI/releases/v3.1/1001genomes_snp-short-indel_only_ACGTN.vcf.gz , then you can use plink directly without any conversion, plink can read vcf formats.
Thanks for your reply. I am not sure which dataset is the write one for 1001 project. I try to use this vcf dataset "1001genomes.org/data/GMI-MPI/releases/v3.1/1001genomes_snp-short-indel_only_ACGTN.vcf.gz". When I use plink to do the quality control " plink --bfile 387snp --maf 0.01 --geno 0.05 --mind 0.05 --hwe 1e-5 --make-bed --out snp2", it shows "error, all the individual removed as -maf -- maf max ". So maybe it is not this dataset.
Hi Did you find which data from the 1001 genome is suitable for GWAS? I have the same problem. Please help