random selection of multi-sample vcf does not work
0
0
Entering edit mode
5.3 years ago
mernster • 0

Hey there,

I am trying to randomly select variants from a multisample vcf file which I generated by merging several individual vcf files together using bcftools merge.

This is the command that I am using:

gatk SelectVariants -R /path/to/ref.fasta -V input.vcf -O output_1pct.vcf --select-random-fraction 0.01--exclude-non-variants true --exclude-filtered true

Unfortunately, it does not select 1 percent of the variants but instead outputs a file with only one variant which is always the same. I tried running this command without the optional arguments but it still outputs this variant:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  indivA  indivB  indivC  indivD  indivE  indivF  indivG  indivH  indivI  indivJ  indivK  indivL  indivM  indivN
35  964 .   A   T   141.05  PASS    AC=1;AN=2   GT:AO:DP:PL:QA:QR:RO    ./. ./. ./. ././.   0/1:6:11:184,0,151:237:200:5    ./. ./. ./. ./. ./. ./. ./. ./.

Also, when I set --select-random-fraction to 1 (100%) it only selects 34 variants although in reality there are 34870672!! It just picks the 34 first variants...

ValidateVariants doesn't give any error messages on the multisample vcf.

However, when I run this same command on an individual vcf it seems to work. Any idea what is going wrong?

GATK version : GATK/4.1.1.0
java version: java/sun_jdk1.8.0_151

Thanks in advance!

gatk selectvariants selectrandomfraction • 1.5k views
ADD COMMENT
0
Entering edit mode

Hello and welcome to biostars mernster ,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2241 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6