Iterating Vcftools Commands With Shell Script
2
2
Entering edit mode
10.7 years ago
vitti ▴ 20

Hi all,

I'm trying to filter some VCFs from 1000G by population. For each population I have lists of sample ids and am using the "--keep" command.

Given a single chromosome and a single population, I am able to run the command and get the output I want. However, I am running into issues when I try to iterate using a simple shell script. Ultimately I want to iterate over both populations and chromosomes using a nested for-loop, but at the moment I can't seem to get a simple loop over chromosomes (holding population constant) to work properly.

#!/bin/sh
pop=CDX
for (( i=1; i<=22; i++ ))
do
    echo ${i}
    ../../../../vcftools/bin/vcftools --vcf ALL.chr${i}.phase3_shapeit2_integrated.20130502.snps_indels_svs.genotypes.vcf --keep ../indivs_by_pop/${pop}_phased_indivs.txt --recode --out filter_chrom${i}_${pop}
    echo ${pop}
done

This script performs the command for chromosome 1 and then stops -- it does not get to the "echo ${pop}" line, and I have no idea why. If I comment out the vcftools command then it properly iterates 22 times, which suggests to me that the problem is that command rather than the rest of the script. I've been reading through the vcftools documentation to try and figure out where I'm going wrong but to no avail. Any insight would be greatly appreciated!

vcftools script • 5.5k views
ADD COMMENT
3
Entering edit mode
10.7 years ago

Have a look at the GNU/Parallel tool, it can do wonderful things with it.

For example:

echo {1..22}_AFR. {1..22}_ASN. {1..22}_EUR.  | sed -e 's/_/\'$'\n/g' -e 's/\./\'$'\n/g' | parallel -N 2 " vcftools --vcf ALL.chr${1}.phase3_shapeit2_integrated.20130502.snps_indels_svs.genotypes.vcf --keep ../indivs_by_pop/${2}_phased_indivs.txt --recode --out filter_chrom${1}_${2} "

See also the discussion dedicated to parallel: GNU Parallel - parallelize serial command line programs without changing them

ADD COMMENT
0
Entering edit mode

Thank you! I still wish I knew why my script doesn't work, but it looks like this tool will obviate the need for such scripts in the future.

ADD REPLY
2
Entering edit mode
10.7 years ago
ole.tange ★ 4.5k

A slightly more readable version of Giovanni's script:

parallel vcftools --vcf chr{1}.genotypes.vcf --keep ../indivs_by_pop/{2}_phased_indivs.txt --recode --out filter_chrom{1}_{2} ::: {1..22} X Y M ::: BOR CEU CHB FIJ JPT NGH POL YRI

(Use multiple ':::' to simulate nested loops)

Use parallel --dryrun to see what it would run.

ADD COMMENT
0
Entering edit mode

very helpful, thank you!

ADD REPLY
0
Entering edit mode

definitely much better, I used this syntax before but I couldn't remember it. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2288 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6