Hello,
I am doing a measurement of the HWE per Population. I have done this already without trouble with 10 populations, but now I'm doing it with 89 populations so I'd like to create a script.
I use this command to create a list with all the populations and their subset of individuals:
grep Barcelona (one of the populations) FILENAME.vcf > population.list
The file created is like this (the population name is separated from the individual name with this symbol ' _ ' ) Barcelona_GRA1 Barcelona_GRA2 Barcelona_GRA3 and so on.
Then I use this command to create a list of the individuals for each population:
cat population.list | tr '\t' '\n' | grep PJL > PJL.list
cat population.list | tr '\t' '\n' | grep BEB > BEB.list
That create a separate file for each population.
To do this 89 times is doable but it's quite long, so I'd like to make a script.
The reason why I need these list files is because I'm doing a script to measure HWE per population. This is my script:
number_of_populations='number'
for num in `seq 1 $number_of_populations`
do
POP=$(sed -n "${num}p" pops.txt)
vcftools --vcf FILENAME.vcf --keep ${POP}.list --hardy --out ${POP}.hwe_pvalue_vcftools
done
2nd script
for num in `seq 1 $number_of_populations`
do
POP=$(sed -n "${num}p" pops.txt)
sites=$(awk '$6 <= 0.0001' ${POP}.hwe_pvalue_vcftools.hwe|wc -l|awk '{print $1}')
echo -e "${POP} \t ${sites}" >> HWE_perpop.txt
done
Any suggestion? Thank you very much