Question

When converting .ped, .bim and .fam files to .pgen, .psam and .pvar in PLINK2

0

Entering edit mode

3.6 years ago

carlstat123 • 0

Hello,

When we convert plink 1.9 file formats to plink 2, how in .pvar file recognises which allele is alternative and which is the reference. Because, in .bim file we do not have columns for alternative and reference alleles, separately. It has only two columns for allele 1 and allele 2. Please help me to understand this.

Thanks a lot.

.pvar Plink2 • 4.5k views

ADD COMMENT • link updated 3.6 years ago by chrchang523 11k • written 3.6 years ago by carlstat123 • 0

score 0 · Answer 1 · 2021-04-07

0

Entering edit mode

3.6 years ago

chrchang523 11k

Your suspicion is correct: plink2 doesn’t know which alleles are actually REF. plink 1.x usually sets the last .bim column (allele 2) to the major allele, and that corresponds to REF more often than not, so that’s the guess that plink2 makes. But these REF alleles are explicitly marked as “provisional”.

You need to use a command like —ref-allele or —ref-from-fa to set them properly.

ADD COMMENT • link 3.6 years ago by chrchang523 11k

0

Entering edit mode

Thanks for this. However, how can we use --ref-allele command?

plink2 --pfile myfile --ref-allele ? Just this way?

Thanks.

ADD REPLY • link 3.6 years ago by carlstat123 • 0

0

Entering edit mode

Please read the documentation at https://www.cog-genomics.org/plink/2.0/data#ref_allele .

ADD REPLY • link 3.6 years ago by chrchang523 11k

0

Entering edit mode

Thanks. Also, can we keep .pvar file without these two columns? (alternative and reference allele columns)

ADD REPLY • link 3.6 years ago by carlstat123 • 0

0

Entering edit mode

No, those are among the five required columns (the others are CHROM/POS/ID).

ADD REPLY • link 3.6 years ago by chrchang523 11k

0

Entering edit mode

Can we keep them as a dot. ex: We have these two columns but value is a " . "

ADD REPLY • link 3.6 years ago by carlstat123 • 0

0

Entering edit mode

They can’t both be “.”. Again, if you know what the two alleles are but you don’t know which is REF and which is ALT, plink2 can mark them as “provisional”, to be disambiguated later (this happens automatically when converting plink 1 binary to plink 2 binary). If you entirely erase the allele codes, the associated genotype data can no longer be interpreted.

ADD REPLY • link 3.6 years ago by chrchang523 11k

0

Entering edit mode

Does it not a problem being "provisional" for later uses?

ADD REPLY • link 3.6 years ago by carlstat123 • 0

0

Entering edit mode

You just need to remember that REF/ALT actually just means A2/A1, until you use --ref-allele or --ref-from-fa to set the correct REF alleles.

ADD REPLY • link 3.6 years ago by chrchang523 11k