PLINK- chromosome ID disappears after running association tests
1
0
Entering edit mode
7.9 years ago

I am using PLINK to run some GWAS analyses working on a non-model organism with 4000+ super-contigs. I've managed to successfully create a map file from a vcf file using vcftools and a chromosome map that I supplied... here is what the map file looks like:

head full.map

supercont1.1004_pilon   10      0       3307
supercont1.1004_pilon   10      0       3310
supercont1.1004_pilon   10      0       3313
supercont1.1004_pilon   10      0       3330
supercont1.1004_pilon   10      0       3361
supercont1.1004_pilon   10      0       3362
supercont1.1004_pilon   10      0       3400
supercont1.1004_pilon   10      0       3416
supercont1.1004_pilon   10      0       3417
supercont1.1007_pilon   18      0       414

However, when I run association tests on the data in plink, all the chromosomes are listed as 0 in the result files:

head full.qassoc.adjusted

 CHR     SNP      UNADJ         GC       BONF       HOLM   SIDAK_SS   SIDAK_SD     FDR_BH     FDR_BY
   0   20271  2.451e-22  1.239e-21  1.154e-17  1.154e-17        INF        INF  5.771e-18  6.542e-17 
   0   20271  2.451e-22  1.239e-21  1.154e-17  1.154e-17        INF        INF  5.771e-18  6.542e-17 
   0   16411  8.034e-15  2.365e-14  3.784e-10  3.783e-10  3.765e-10  3.765e-10  1.261e-10   1.43e-09 
   0   35808  4.862e-14  9.941e-12   2.29e-09   2.29e-09   2.29e-09   2.29e-09   4.58e-10  5.192e-09 
   0   35808  4.862e-14   1.43e-13   2.29e-09   2.29e-09   2.29e-09   2.29e-09   4.58e-10  5.192e-09 
   0   79625   4.19e-12   1.43e-13  1.973e-07  1.973e-07  1.973e-07  1.973e-07  3.289e-08  3.729e-07 
   0  101661  7.266e-08  2.566e-07   0.003422   0.003422   0.003416   0.003416  0.0004889   0.005542 
   0   94063  6.586e-07   7.75e-06    0.03102    0.03102    0.03054    0.03054   0.003878    0.04396 
   0   79929  1.263e-06  3.647e-06    0.05947    0.05946    0.05774    0.05773    0.00585    0.06632

Is this because the program does not like to have more than 22 chromosomes? Or do I need to reformat the names? Is there a way to work around this?

I also noticed that I have multiple snps with the same ID. I don't know if this is normal (to group snps which are physically close to one another) or some problem with vcf tools which I used to make the files. This has the unfortunate consequence that I do not know exactly which of the SNPs/base pair positions are the ones with significant p-values in my association tests. Is it possible to recode the map file somehow to give them each random but unique identifiers?

Thanks in advance!

SNP plink GWAS • 1.5k views
ADD COMMENT
0
Entering edit mode

Are you using plink 1.07 or 1.9? 1.07 does not keep track of arbitrary contig names, but 1.9 should.

ADD REPLY
0
Entering edit mode

I formatted post for readability using the 101010 button, try to do the same in the feature to properly structure file content.

ADD REPLY
0
Entering edit mode

I guess you could chose more descriptive IDs in your vcf file, which would then be used here as well?

ADD REPLY
0
Entering edit mode
7.9 years ago

The first column is CHR and the second is SNPID. Have you got these the wrong way around?

I think that chromosome has to be numeric and, because yours is not ["supercont1.1004_pilon"], it is returning a null ('0') chromosome value. You are probably also correct that only a chromosome value between 1-22 (23-26 from Sex/MT) will be acceptable.

ADD COMMENT

Login before adding your answer.

Traffic: 2366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6