Entering edit mode
22 months ago
eb13
▴
20
Hi all - I have a very large multi sample vcf file which I am trying to subset by a list of sample IDs, however, my current approach is working very slowly (>2hr per chromosome) and I am wondering if there are any tricks to making it run faster with large files? Here is my current approach:
for file in /vcffiles/*.vcf.gz; do
bcftools view -Oz -S sample_list.txt $file > /output/subset_"${i##*/}"
done
Thanks in advance for any suggestions!
Maybe this link is useful: How to parallelize bcftools mpileup with GNU parallel?
thank you for your helpful responses!