Question

VCFTools MAF Filter needs Efficiency Inquiry

0

Entering edit mode

20 months ago

S • 0

I am performing a Minor Allele Frequency task on the cloud for my analysis.

vcftools --recode --recode-INFO-all --gzvcf /path/to/input.vcf --maf 0.01 --out output.maf.vcf > stdout.out

This process is taking exceedingly long (1 hour for a 30 GB Chr1 file) on a c5.4xlarge instance type. I thought about using threads or chunking or other subsetting data analysis techniques but have encountered trouble in the implementation. I read through the VCFTools documentation and could not find any threading / chunking that could be done within this method call.

Another approach I thought of was unzipping the GZipped file, then reading in only the genotypic information to a new vcf file and then filtering for MAF. This method does not seem like the most efficient manner to perform a MAF filter step.

Is there anything I am not considering while trying to speed up this process?

Thank you for your consideration

vcf maf vcftools • 788 views

ADD COMMENT • link updated 20 months ago by chrchang523 11k • written 20 months ago by S • 0

score 0 · Answer 1 · 2023-07-23

0

Entering edit mode

20 months ago

chrchang523 11k

The main thing you aren't considering is that vcftools was mostly superseded by bcftools several years ago.

ADD COMMENT • link 20 months ago by chrchang523 11k