How to preserve alleles across plink operations?
1
0
Entering edit mode
2.6 years ago
kynnjo ▴ 70

I have found that some sequences of plink operations can result in a corruption of allele information.

I came across this problem in the process of merging large numbers (20K+) of single-sample VCFs (each comprising ~1.5M sites). I describe my approach for doing this in more detail below.

The test below, however, is a radical simplification of what I am actually doing, shown here only to illustrate the problem as simply as possible. The idea behind this test is to perform the following sequence of conversions:

VCF1 -> PED -> BED -> VCF2

...and then to compare VCF1 with VCF2. Ideally, VCF1 and VCF2 should be identical; in practice, when I try the above sequence using a tiny (1 locus, 1 sample) test vcf, this is what I get:

### VCF1
#CHROM   POS        ID            REF   ALT   QUAL   FILTER   INFO   FORMAT   test
13       19020095   rs140871821   C     T     .      .        .      GT       0/0

### VCF2
#CHROM   POS        ID            REF   ALT   QUAL   FILTER   INFO   FORMAT   test
13       19020095   rs140871821   C     .     .      .        .      GT       0/0

(In this example, VCF2 differs from VCF1 only at the ALT field, but I have also run into cases where both the REF and ALT fields are different.)

Below are the operations I used to perform the various conversions:

### VCF1 -> PED
plink --recode --a2-allele <INPUTVCF> 4 3 '#' --real-ref-alleles --vcf <INPUTVCF> --out <PEDPREFIX>

### PED  -> BED
plink --make-bed --a2-allele <INPUTVCF> 4 3 '#' --real-ref-alleles --map <PEDPREFIX>.map --ped <PEDPREFIX>.ped --out <BEDPREFIX>

### BED  -> VCF2
plink --recode vcf --a2-allele <INPUTVCF> 4 3 '#' --real-ref-alleles --bfile <BEDPREFIX> --out <OUTPUTVCFPREFIX>

The version of plink I have been using is 1.90b3v.

Note, in particular, that I have added the flags

--a2-allele <INPUTVCF> 4 3 '#' --real-ref-alleles 

...to all the commands above. Here, I have been following the documentation's recommendation: "When it is important for reference alleles to be correct, you'll usually also want to include --a2-allele and --real-ref-alleles in your command." (The parameters I have given for the --a2-allele flag are also based on the documentation's recommendations. I realize that not all the plink commands I am using above may require these flags; the fact that I have included them in all these commands can be seen as an act of desperation.)

As I mentioned above, I ran into this problem while attempting to merge a large number of single-sample VCFs.

I found that the only way I could perform this merge in a reasonable time (i.e. hours rather than days or weeks) was to do the following:

  1. convert the single-sample VCFs PED files;
  2. concatenate all the individual PED files into a merged PED file;
  3. convert the merged PED file to a (merged) BED file;
  4. convert the (merged) BED file to a multi-sample VCF.

This procedure indeed takes only a few hours, but unfortunately, the first 9 columngs of the final multi-sample VCF differ from the first 9 columns of the individual single-sample VCFs. (To be clear: the first 9 columns of the individual single-sample VCFs are identical across all the samples, down to the ordering of the rows.)

How can I perform the VCF0 -> PED -> PED -> ... -> VCF transformation in a way that ensures that the first 9 columns of the resulting merged VCFs are identical to the corresponding columns of the input VCFs?

vcf plink • 966 views
ADD COMMENT
0
Entering edit mode
2.6 years ago

In this workflow, you'll also need to run --a1-allele to restore lost ALT alleles during VCF -> PED conversion.

(I also recommend updating your plink 1.9 build.)

ADD COMMENT
0
Entering edit mode

Thank you. I did try that idea already, but plink throws an error to the effect that the --a1-allele and --a2-allele flags are incompatible; i.e. they cannot both be specified in the same command.

ADD REPLY
0
Entering edit mode

You can specify it in its own command any time after PED conversion, but before the final VCF is exported.

ADD REPLY

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6