Entering edit mode
8.2 years ago
fatima
▴
20
Hi everyone,
I worked on case-only design. I tried to impute untyped genotypes come from immunochip. Before imputation, I tried to merge 1000G reference panel and cases in plink. I had the error:
Warning: Multiple positions seen for variant 'rs200357792'.
Warning: Multiple positions seen for variant 'rs201556956'.
Warning: Multiple positions seen for variant 'rs200991502'.
17838 markers loaded from CD_GermanyKielchr2_mod.bim.
7047141 markers to be merged from ref_b37_ph3.bim.
Of these, 7029359 are new, while 17782 are present in the base dataset.
Error: 7932 variants with 3+ alleles present.
* If you believe this is due to strand inconsistency, try --flip with
merge-merge.missnp.
(Warning: if this seems to work, strand errors involving SNPs with A/T or C/G
alleles probably remain in your data. If LD between nearby SNPs is high,
--flip-scan should detect them.)
* If you are dealing with genuine multiallelic variants, we recommend exporting
that subset of the data to VCF (via e.g. '--recode vcf'), merging with
another tool/script, and then importing the result; PLINK is not yet suited
to handling them.
I removed multiple position variants and duplicate variants. In addition I filtered triallelic SNPs in vcf file and did flip “prefix”.missnp. I tried again to merege cases and reference panel.
I had this error again:
Error: 7916 variants with 3+ alleles present.
* If you believe this is due to strand inconsistency, try --flip with
mergefiles2-merge.missnp.
(Warning: if this seems to work, strand errors involving SNPs with A/T or C/G
alleles probably remain in your data. If LD between nearby SNPs is high,
--flip-scan should detect them.)
* If you are dealing with genuine multiallelic variants, we recommend exporting
that subset of the data to VCF (via e.g. '--recode vcf'), merging with
another tool/script, and then importing the result; PLINK is not yet suited
to handling them.
I think plink is not suitable for my data. Am I true? What is the reason? Thank you in advance.
I think that PLINK is suitable - what is the source of your non-1000G data, though?
Perhaps you could try to follow my tutorial, here: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format
You may have to remove SNPs from your non-1000G data that are called on the non-coding strand.