bcftools query for multi-allelic cases
1
0
Entering edit mode
7.7 years ago
tiplud • 0

Hi Everyone, I have a question regarding how to efficiently obtain the output using bcftools query when there is a multi-allelic case. I have a variable test_variants which contains the following variant information : START\tPOS\tREF\tALT. Two example lines are :

1   16894472    A   G
1   16894476    G   C

I would like to get the dbSNP id and Allele Frequency for these entries from the gnomAD exome vcf file ( $vcf_exome ). When I run the following command :

bcftools view -O v -R <(echo "$test_variants") "$vcf_exome"  | grep -Ef <(awk 'BEGIN{FS=OFS="\t";print "#"};{print "^"$1,$2,"[^\t]+",$3}' <(echo "$test_variants")) | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%ID\t%AF\n'
the output I get is : 
1   16894472    A   G,C rs763081531 2,0 8.5764e-06,0

1   16894476    G   C,T,A   rs4661811   1,33,1  4.32436e-06,0.000142704,4.32436e-06

Is there a way in bcftools to output the information for my only ALT allele ? I feel I need to post-process the output with awk and get the index of the matching ALT allele from the 4th column of the output. In such a case, I am not sure how to do this in a batch way. Thank you! Debayan

SNP next-gen • 3.4k views
ADD COMMENT
3
Entering edit mode
7.7 years ago

using awk

echo -e  "1,16894472,A,G\n1,16894476,G,T\n1,16894476,G,C" | while IFS="," read -ra T; do  bcftools view "https://storage.googleapis.com/gnomad-public/release-170228/vcf/genomes/gnomad.genomes.r2.0.1.sites.${T[0]}.vcf.gz" "${T[0]}:${T[1]}-${T[1]}" | awk -F '\t' -v P=${T[1]} -v R=${T[2]} -v A=${T[3]} '/^#/ {next;} {if($2!=P || $4!=R) next;nalts=split($5,alts,/,/);for(i=1;i<=nalts;++i) {if(A!=alts[i]) continue;  natts=split($8,atts,/;/);for(j=1;j<=natts;++j) {if(substr(atts[j],1,3)!="AF=") continue;nafs=split(substr(atts[j],4),afs,/,/); print $1,$2,$3,$4,A,afs[i];}} }' ; done

1 16894472 rs763081531 A G 0.000424418
1 16894476 rs4661811 G T 0.000109649
1 16894476 rs4661811 G C 0.000584795
ADD COMMENT
0
Entering edit mode

Amazing ! Works perfect ! Thanks a lot :)

ADD REPLY
0
Entering edit mode

it if answers the question, please, check the green mark on the left to close the question.

ADD REPLY
0
Entering edit mode

Is there a way to run this on a run this on a vcf file formatted the same way as tiplud's example lines and save the output?

myVariants.vcf

[1]CHROM    [2]POS  [3]REF  [4]ALT
1   705898  T   G   
1   754744  G   C   
1   933185  C   A

edit: For anyone in the future, reading a file into the echo command and removing the internal field separator (reverting it to the default whitespace) worked for my file.

while read -r line; do echo -e $line; done < myVariants.vcf | \
while read -ra T; do \
bcftools view af-only-gnomad.raw.sites.b37.vcf.gz \
"${T[0]}:${T[1]}-${T[1]}" | \
awk -F '\t' \
-v P=${T[1]} \
-v R=${T[2]} \
-v A=${T[3]} \
'/^#/ {next;} \
{if($2!=P || $4!=R) \
next;nalts=split($5,alts,/,/);for(i=1;i<=nalts;++i) \
{if(A!=alts[i]) continue;  natts=split($8,atts,/;/);for(j=1;j<=natts;++j)\
 {if(substr(atts[j],1,3)!="AF=") \
 continue;nafs=split(substr(atts[j],4),afs,/,/); \
 print $1,$2,$3,$4,A,afs[i];}} }' ; done > gnomadAFs.appended.to.my.input.vcf
ADD REPLY

Login before adding your answer.

Traffic: 1923 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6