Question

1001 Arabidopsis SNP

2

Entering edit mode

5.9 years ago

shawn ▴ 20

Hi everyone,

I am learning to do some GWAS analysis in Arabidopsis. I used some accessions from the 1135 list (1001 genomes project)for a GWAS experiment. I have some questions for the genotype data. I find there are several different genomes data including vcf format and hdf5 format. I selected the one named “1001_SNP_MATRIX.tar.gz”. So I want to ask if it is the right genotype data for GWAS analysis. And also I have a problem to convert the hdf5 format to plink format. Does anybody know how to figure it out. Look forward to your reply.

Thanks.

https://1001genomes.org/data/GMI-MPI/releases/v3.1/

SNP plink vcf • 3.0k views

ADD COMMENT • link updated 3.3 years ago by Nikwan • 0 • written 5.9 years ago by shawn ▴ 20

0

Entering edit mode

You need to figure out which dataset you need to work on. If it is VCF file, for example this file: https://1001genomes.org/data/GMI-MPI/releases/v3.1/1001genomes_snp-short-indel_only_ACGTN.vcf.gz , then you can use plink directly without any conversion, plink can read vcf formats.

ADD REPLY • link 5.9 years ago by zx8754 12k

0

Entering edit mode

Thanks for your reply. I am not sure which dataset is the write one for 1001 project. I try to use this vcf dataset "1001genomes.org/data/GMI-MPI/releases/v3.1/1001genomes_snp-short-indel_only_ACGTN.vcf.gz". When I use plink to do the quality control " plink --bfile 387snp --maf 0.01 --geno 0.05 --mind 0.05 --hwe 1e-5 --make-bed --out snp2", it shows "error, all the individual removed as -maf -- maf max ". So maybe it is not this dataset.

ADD REPLY • link 5.9 years ago by shawn ▴ 20

0

Entering edit mode

Hi Did you find which data from the 1001 genome is suitable for GWAS? I have the same problem. Please help

ADD REPLY • link 3.3 years ago by Nikwan • 0