Hi,
I would like to use the "clump" function for my association analysis results which use 10000 Genomes imputed data (and I have many files), so I guess I need to use the 10000 Genomes genotypes as a reference file, but I am having trouble with this. I have downloaded the vcf files from the MACH website here.
Then I thought I could use Plink 1.9 version that reads vcf and use directly the "clump" command here, but apparently there are issues with the vcf file for allele names (I would like to keep as many variants as possible), so I followed this link to convert from vcf to plink.
But I get a segfault error as soon as I type to this command:
bcftools annotate -Ob -x ID chr21.test.vcf
which says:
[W::vcf_parse] INFO 'NS' is not defined in the header, assuming Type=String
Encountered error, cannot proceed. Please check the error output above.
I have tried to use bcftools reheader -h
but it doesn't work, can someone please help?
Thank you!!
Hi chrchang, thank you for your reply. I would like to keep both SNPs and INDELs in the dataset. I also had seen some discussion on google group about this, and that is where I had found the link to follow to convert correctly from vcf to plink which does not work for me: https://groups.google.com/forum/#!msg/plink2-users/xDYgOnAofwo/GmGFXlE4YCYJ
As you suggested I have tried this:
I am trying to convert the INDELs as they are in MACH (i.e.
chr:pos:A1_A2
), and the output looks good for cases like this:But not for cases where the alleles should be flipped:
It should be like this:
21:34884877:TATTT
while it comes out in the bim file like this:21:34884877:T_TATTTG
Is there a way to fix these cases?
Thank you very much for your help!
This is outside
--set-missing-var-ids
's current scope, but can be done with the help of awk. One possible workflow:1. Perform basic format conversion, no renaming yet. (If you include
--make-bed
in this command, you also need to include--keep-allele-order
; otherwise you will probably get unwanted allele swaps.)2. You now want to replace all '.'s in the second column of the .bim file with
[first column]:[fourth column]:[fifth column]_[sixth column]
. This can be done as follows:A similar command can be used to make this change directly to the VCF, if necessary.
I see. Thank you very much!