Filtering a VCF by "IMP" flag in INFO with bcftools
0
0
Entering edit mode
2.2 years ago
Vanish007 ▴ 50

Hi,

I have a vcf file that has imputed markers after being phased through BEAGLE. I wanted to filter those SNPs and samples that have the imputed flag.

Through the header, I see the following:

##INFO=<ID=IMP,Number=0,Type=Flag,Description="Imputed marker">
#CHROM  POS     ID             REF     ALT     QUAL    FILTER  INFO    FORMAT
1       10177   rs367896724     A       AC      .       PASS    DR2=0.10;AF=0.3848;IMP;CSQ=C

I've trying the following commands:

$ bcftools view -i "INFO/IMP' myphased.vcf > Imputed_VCF.vcf

and it runs for a VERY long time without really outputting anything.

Trying it out as a head:

$ bcftools view -i 'INFO/IMP' myphased.vcf | head -10

Just gives the following:

 ##fileformat=VCFv4.2
 ##FILTER=<ID=PASS,Description="All filters passed">
 ##filedate=20220124
 ##source="beagle.28Jun21.220.jar"
 ##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated ALT Allele Frequencies">
 ##INFO=<ID=DR2,Number=A,Type=Float,Description="Dosage R-Squared: estimated squared correlation between estimated REF dose [P(RA) + 2*P(RR)] and true REF dose">
 ##INFO=<ID=IMP,Number=0,Type=Flag,Description="Imputed marker">
 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 ##FORMAT=<ID=DS,Number=A,Type=Float,Description="estimated ALT dose [P(RA) + 2*P(AA)]"> 
 ##contig=<ID=1>

The same output occurs if I try

  $ bcftools view -i 'INFO/IMP=1' myphased.vcf | head -10

Running a "grep" expression for "IMP" does give me what I need, but not formatted correctly and I can't use bcftools on that file. Is there something I'm missing or doing wrong?

Thanks!

vcf bcftools • 2.4k views
ADD COMMENT
0
Entering edit mode

bcftools view -i 'INFO/IMP=1' myphased.vcf

it should work. what's wrong with the output ?

and it runs for a VERY long time without really outputting anything.

what's the size of the input

ADD REPLY
0
Entering edit mode

It's around 27GB compressed. I might try running it over a few hours and see if anything shows up. I waited about 2 hours to see something in the output file and nothing was there, but I can see if anything shows up. Thanks for confirming that the code is correct at least!

ADD REPLY
0
Entering edit mode

it should work. what's wrong with the output ?

Unfortunately it did not work, all that occurred was that the complete header was outputted after the command ran for 5.5 hours.

ADD REPLY
1
Entering edit mode

works on my machine:

gunzip -c rotavirus_rf.vcf.gz | grep Flag
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">

$ bcftools view -i 'INDEL==1' rotavirus_rf.vcf.gz  | bcftools query -f '%REF %ALT %INFO/INDEL\n'
TACA TA 1
ATTT ATT 1
CAGA CA 1
ATTT AT 1
CAAAAA CAA 1

$ bcftools view -e 'INDEL==1' rotavirus_rf.vcf.gz  | bcftools query -f '%REF %ALT %INFO/INDEL\n' |head
A C .
A T .
G A .
T A .
T G .
C G .
C A .
T G .
G T .
T A .
ADD REPLY
0
Entering edit mode

works on my machine:

Interesting, so I tried this

 $ bcftools view -e 'IMP==0' myphased.vcf.gz  | bcftools query -f '%REF %ALT %INFO/IMP\n' |head

and received this optimistic output

   A AC 1
   T TA 1
   T TA 1
   CCGCCGTTGCAAAGGCGCGCCG C 1
   G A 1
   C G 1
   C G 1
   T G 1
   G A 1
   G C 1

I will try and run

 $ bcftools view -e 'IMP==0' myphased.vcf.gz

and see what that outputs into a file. Thanks again for the help!

ADD REPLY
0
Entering edit mode

what's the version of bcftools ?

ADD REPLY
0
Entering edit mode

Hi thanks for the response, I'm running the latest version

$ bcftools version
  bcftools 1.16
  Using htslib 1.16
  Copyright (C) 2022 Genome Research Ltd.
  License Expat: The MIT/Expat license
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.
ADD REPLY

Login before adding your answer.

Traffic: 1505 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6