Hi everyone! I'm a newbie in genomic data analysis, that’s why I’m asking for some help in things that might be easy in fact. I have a zipped vcf-file, which contains human chromosome 20 sequences for 269 individuals. I want to filter out singletons and doubletons for subsequent analysis. I use vcftools v0.1.15 installed on a server. Here is what I do:
vcftools --gzvcf chr20_269ind.vcf.gz --mac 1 --max-mac 1 --recode --stdout | gzip -c > output_test.vcf.gz
however, I get an empty output file (only sample names, no sequence information) and a message that says as following:
Outputting VCF file... After filtering, kept 0 out of a possible 991704 Sites No data left for analysis! Run Time = 24.00 seconds
I’ve tried to play around with --mac and --max-mac flags. First I run the following line:
vcftools --gzvcf chr20_269ind.vcf.gz - -max-mac n --recode --stdout | gzip -c > output_tesmaxnt.vcf.gz
where I tried n = 1; 10 or 100. All three attempts gave me the same output file (not empty this time) and the log file saying
Outputting VCF file... After filtering, kept 991704 out of a possible 991704 Sites Run Time = 95.00 seconds
Actually I get the same output if I run this (i.e. with no --mac or max-mac flags)
vcftools --gzvcf chr20_269ind.vcf.gz --recode --stdout | gzip -c > output_tesmaxnt.vcf.gz
Then I’ve tried running
vcftools --gzvcf chr20_269ind.vcf.gz --mac 1 --recode --stdout | gzip -c > output_test.vcf.gz
and got an empty output again. Then I’ve run
vcftools --gzvcf chr20_269ind.vcf.gz --mac 0 --recode --stdout | gzip -c > output_test.vcf.gz
and got the same file as in case of - -max-mac n. It seems to me that these flags ‘see’ my file as if it contained only zeros, which is not the case (I’ve looked at the content of the file manually). If I try to filter for minor allele frequency instead of allele counts (which is not what I want to do, but I was just playing around to better understand what’s going on) I get this:
Outputting VCF file... Error: Require Genotypes in variant file to filter by frequency and/or call rate
I’ve tried vcftools versions 0.1.13 as well with no difference.
Any hints would be greatly appreciated.
Best,
Vasili
: