Question

Merging variant replicates rather than filtering

0

Entering edit mode

7.0 years ago

dylkot ▴ 10

I'm analyzing some Illumina genotype data and noticing that almost 2% of variants on the array are typed more than once. I consider these measurements to be replicate if they are measuring the same chromosome, position, reference allele, and alternate allele as specified in the PLINK BIM file. It seems that the typical thing people do is remove all but the first replicate in the file using the --list-duplicate-vars option in PLINK. Are there any tools that implement something slightly smarter than this? Like, for example, a desired behavior might be to take a consensus vote amongst the calls from the replicates to decide which one to use. Thanks in advance!

GWAS Plink vcftools • 1.9k views

ADD COMMENT • link updated 7.0 years ago by chrchang523 11k • written 7.0 years ago by dylkot ▴ 10

score 0 · Answer 1 · 2018-08-29

0

Entering edit mode

7.0 years ago

chrchang523 11k

You can use plink to merge a fileset with itself.

ADD COMMENT • link 7.0 years ago by chrchang523 11k

0

Entering edit mode

Thanks for the tip! Are you referring to the --merge-equal-pos option in PLINK 1.9? If so, do you know how it performs the merge? The documentation is ambiguous stating:

If two variants have the same position, PLINK 1.9's merge commands will always notify you. If you wish to try to merge them, use --merge-equal-pos. (This will fail if any of the same-position variant pairs do not have matching allele names.) Unplaced variants (chromosome code 0) are not considered by --merge-equal-pos.

Note that you are permitted to merge a fileset with itself; doing so with --merge-equal-pos can be worthwhile when working with data containing redundant loci for quality control purposes.

There is no reference to this in PLINK 2 as far as I'm aware

ADD REPLY • link 7.0 years ago by dylkot ▴ 10

0

Entering edit mode

Also, as a quick note, the command:

plink --bmerge inputdata --bfile inputdata --merge-equal-pos --out outputdata

fails when (as in my case), variants and the same position are replicates and in other cases, they are multi-allelic variants and so have different ALT alleles.

ADD REPLY • link 7.0 years ago by dylkot ▴ 10

0

Entering edit mode

For now, I'd use PLINK 2.0's --set-all-var-ids flag to assign brand new chrom/pos/ref/alt-based IDs to every variant, and then follow up with PLINK 1.9's merge function (since yes, merge is not yet implemented in 2.0).

ADD REPLY • link 7.0 years ago by chrchang523 11k