Hi Everyone, I have a question regarding how to efficiently obtain the output using bcftools query when there is a multi-allelic case. I have a variable test_variants which contains the following variant information : START\tPOS\tREF\tALT. Two example lines are :
1 16894472 A G
1 16894476 G C
I would like to get the dbSNP id and Allele Frequency for these entries from the gnomAD exome vcf file ( $vcf_exome ). When I run the following command :
bcftools view -O v -R <(echo "$test_variants") "$vcf_exome" | grep -Ef <(awk 'BEGIN{FS=OFS="\t";print "#"};{print "^"$1,$2,"[^\t]+",$3}' <(echo "$test_variants")) | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%ID\t%AF\n'
the output I get is :
1 16894472 A G,C rs763081531 2,0 8.5764e-06,0
1 16894476 G C,T,A rs4661811 1,33,1 4.32436e-06,0.000142704,4.32436e-06
Is there a way in bcftools to output the information for my only ALT allele ? I feel I need to post-process the output with awk and get the index of the matching ALT allele from the 4th column of the output. In such a case, I am not sure how to do this in a batch way. Thank you! Debayan
Amazing ! Works perfect ! Thanks a lot :)
it if answers the question, please, check the green mark on the left to close the question.
Is there a way to run this on a run this on a vcf file formatted the same way as tiplud's example lines and save the output?
myVariants.vcf
edit: For anyone in the future, reading a file into the echo command and removing the internal field separator (reverting it to the default whitespace) worked for my file.