I am using PLINK to run some GWAS analyses working on a non-model organism with 4000+ super-contigs. I've managed to successfully create a map file from a vcf file using vcftools and a chromosome map that I supplied... here is what the map file looks like:
head full.map
supercont1.1004_pilon 10 0 3307
supercont1.1004_pilon 10 0 3310
supercont1.1004_pilon 10 0 3313
supercont1.1004_pilon 10 0 3330
supercont1.1004_pilon 10 0 3361
supercont1.1004_pilon 10 0 3362
supercont1.1004_pilon 10 0 3400
supercont1.1004_pilon 10 0 3416
supercont1.1004_pilon 10 0 3417
supercont1.1007_pilon 18 0 414
However, when I run association tests on the data in plink, all the chromosomes are listed as 0 in the result files:
head full.qassoc.adjusted
CHR SNP UNADJ GC BONF HOLM SIDAK_SS SIDAK_SD FDR_BH FDR_BY
0 20271 2.451e-22 1.239e-21 1.154e-17 1.154e-17 INF INF 5.771e-18 6.542e-17
0 20271 2.451e-22 1.239e-21 1.154e-17 1.154e-17 INF INF 5.771e-18 6.542e-17
0 16411 8.034e-15 2.365e-14 3.784e-10 3.783e-10 3.765e-10 3.765e-10 1.261e-10 1.43e-09
0 35808 4.862e-14 9.941e-12 2.29e-09 2.29e-09 2.29e-09 2.29e-09 4.58e-10 5.192e-09
0 35808 4.862e-14 1.43e-13 2.29e-09 2.29e-09 2.29e-09 2.29e-09 4.58e-10 5.192e-09
0 79625 4.19e-12 1.43e-13 1.973e-07 1.973e-07 1.973e-07 1.973e-07 3.289e-08 3.729e-07
0 101661 7.266e-08 2.566e-07 0.003422 0.003422 0.003416 0.003416 0.0004889 0.005542
0 94063 6.586e-07 7.75e-06 0.03102 0.03102 0.03054 0.03054 0.003878 0.04396
0 79929 1.263e-06 3.647e-06 0.05947 0.05946 0.05774 0.05773 0.00585 0.06632
Is this because the program does not like to have more than 22 chromosomes? Or do I need to reformat the names? Is there a way to work around this?
I also noticed that I have multiple snps with the same ID. I don't know if this is normal (to group snps which are physically close to one another) or some problem with vcf tools which I used to make the files. This has the unfortunate consequence that I do not know exactly which of the SNPs/base pair positions are the ones with significant p-values in my association tests. Is it possible to recode the map file somehow to give them each random but unique identifiers?
Thanks in advance!
Are you using plink 1.07 or 1.9? 1.07 does not keep track of arbitrary contig names, but 1.9 should.
I formatted post for readability using the
101010
button, try to do the same in the feature to properly structure file content.I guess you could chose more descriptive IDs in your vcf file, which would then be used here as well?