Hey there,
I am trying to randomly select variants from a multisample vcf file which I generated by merging several individual vcf files together using bcftools merge.
This is the command that I am using:
gatk SelectVariants -R /path/to/ref.fasta -V input.vcf -O output_1pct.vcf --select-random-fraction 0.01--exclude-non-variants true --exclude-filtered true
Unfortunately, it does not select 1 percent of the variants but instead outputs a file with only one variant which is always the same. I tried running this command without the optional arguments but it still outputs this variant:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT indivA indivB indivC indivD indivE indivF indivG indivH indivI indivJ indivK indivL indivM indivN
35 964 . A T 141.05 PASS AC=1;AN=2 GT:AO:DP:PL:QA:QR:RO ./. ./. ./. ././. 0/1:6:11:184,0,151:237:200:5 ./. ./. ./. ./. ./. ./. ./. ./.
Also, when I set --select-random-fraction
to 1 (100%) it only selects 34 variants although in reality there are 34870672!! It just picks the 34 first variants...
ValidateVariants doesn't give any error messages on the multisample vcf.
However, when I run this same command on an individual vcf it seems to work. Any idea what is going wrong?
GATK version : GATK/4.1.1.0
java version: java/sun_jdk1.8.0_151
Thanks in advance!
Hello and welcome to biostars mernster ,
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thank you!