I want to use plink's linkage disequilibrium feature to filter my VCF file. I'm new to genomics, but after reading plink's documentation, I assumed I could do this in one command:
plink \
--bcf /input/${CHROMOSOME_ID}.vcf.gz \
--indep $LD_WINDOW_SIZE_KB $LD_STEP_SIZE $VIF_THRESHOLD \
--recode vcf \
--out /output/ch${CHROMOSOME_ID} \
--allow-extra-chr
I then use the output file, e.g., ch6.vcf
, for downstream analysis. I never bothered touching the .in
and .out
files because according to the plink data docs:
--recode creates a new text fileset, after applying sample/variant filters and other operations.
so I assumed plink's --recode
would interpret my $VIF_THRESHOLD
as a variant filter operation. However, in other, older biostars posts I've read that you have to do the filtering using .in
or .out
in a separate command. Is my original command incorrect?