Entering edit mode
2.7 years ago
User000
▴
710
Dear all,
I am using this script, from this post to count the number of heterozygous homozygous samples.
paste <(bcftools view MyVariants.Norm.vcf |\
awk -F"\t" 'BEGIN {print "CHR\tPOS\tID\tREF\tALT"} \
!/^#/ {print $1"\t"$2"\t"$3"\t"$4"\t"$5}') \
\
<(bcftools query -f '[\t%SAMPLE=%GT]\n' MyVariants.Norm.vcf |\
awk 'BEGIN {print "nHet"} {print gsub(/0\|1|1\|0|0\/1|1\/0/, "")}') \
\
<(bcftools query -f '[\t%SAMPLE=%GT]\n' MyVariants.Norm.vcf |\
awk 'BEGIN {print "nHomAlt"} {print gsub(/1\|1|1\/1/, "")}') \
\
<(bcftools query -f '[\t%SAMPLE=%GT]\n' MyVariants.Norm.vcf |\
awk 'BEGIN {print "nHomRef"} {print gsub(/0\|0|0\/0/, "")}') \
\
<(bcftools view MyVariants.Norm.vcf | awk -F"\t" '/^#CHROM/ {split($0, header, "\t"); print "HetSamples"} \
!/^#CHROM/ {for (i=10; i<=NF; i++) {if (gsub(/0\|1|1\|0|0\/1|1\/0/, "", $(i))==1) {printf header[i]","}; if (i==NF) {printf "\n"}}}') \
\
<(bcftools view MyVariants.Norm.vcf | awk -F"\t" '/^#CHROM/ {split($0, header, "\t"); print "HomSamplesAlt"} \
!/^#CHROM/ {for (i=10; i<=NF; i++) {if (gsub(/1\|1|1\/1/, "", $(i))==1) {printf header[i]","}; if (i==NF) {printf "\n"}}}') \
\
<(bcftools view MyVariants.Norm.vcf | awk -F"\t" '/^#CHROM/ {split($0, header, "\t"); print "HomSamplesRef"} \
!/^#CHROM/ {for (i=10; i<=NF; i++) {if (gsub(/0\|0|0\/0/,"", $(i))==1) {printf header[i]","}; if (i==NF) {printf "\n"}}}') \
\
| sed 's/,\t/\t/g' | sed 's/,$//g' > counts.tab
it gives me the rigth number of het/hom samples, but the sample names are not, because I have this problem in my vcf file, for example I have 4 nhet samples, but only 2 sample names are printed and other two are not. How to solve this problem?
prints
0/1:21,6:27:7:.:.:7,0,252
0/1:22,6:28:55:.:.:55,0,795
does not print
0/1:6,1:7:18:0|1:45870_C_T:18,0,329
0|1:3,2:5:31:0|1:45887_T_G:31,0,67