VCF filtering based on AB (Allele-Balance) tag
1
2
Entering edit mode
6.3 years ago
AISHA ▴ 140

Hi all,

I want to filter a multi sample vcf based on Allele Balance (AB) value of heterozygous calls only. The purpose is to keep those variants which have AB value between 0.25 and 0.75 and are heterozygous. In addition, homozygous calls will also be kept in vcf, and no filtering needs to be done for those variants based on any tag.

I don't know how can I filter VCF based on above-mentioned criteria.

For the sake of brevity, I just posted an example line from my input VCF.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1 Sample2
> chr1  87083 . G  T  69.7403 . AB=0.235294;ABP=13.3567;AC=1; GT:DP:RO:QR:AO:QA:GL 0/0:16:16:582:0:0:0,-4.81648,-52.7438 0/0:26:26:953:0:0:0,-7.82678,-86.1365

I'll be thankful for any help in this regard.

Aisha

VCF Allele balance • 9.7k views
ADD COMMENT
0
Entering edit mode

Try GATK variant filtration walker and example to filter by AB is provided in manual page (https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_filters_VariantFiltration.php) You can use snpsift or bcftools for filtering format field. What is the criteria for homozygous and heterozygous calls in your vcf?

ADD REPLY
0
Entering edit mode

Yeah. I've tried bcf filtering for Allele-Balance (AB). But I'm unable to get the desired output as I mentioned above. Calls having 0/0 or 1/1 are homozygous while those having GT 0/1 or 1/0 are heterozygous.

ADD REPLY
0
0
Entering edit mode

Hi @Pierre!

In that particular post, you mentioned, solution is provided based on AD not AB. That's why I had to open another question.

ADD REPLY
0
Entering edit mode

I see. Your question is not clear to me. You have a AB value in the INFO column but I don't understand why you say: " homozygous calls will also be kept in vcf"

ADD REPLY
1
Entering edit mode
6.3 years ago

try this @ AISHA and update if it works or not.

 bcftools view -g het -i  "AB>0.25 && AB < 0.75" input.vcf.gz

bgzip the vcf (with bgzip) and index vcf.gz (with tabix). Check also if AB is zero for homozygous calls. Try this if you want to hom all and het with AB cut off.

$ bcftools view -i  'GT="hom"| (GT="het" && (AB >0.25 && AB <0.75))' input.vcf.gz
ADD COMMENT
0
Entering edit mode

Hi @cpad0112

I tried commands you suggested but they didn't provide optimal solution.

In case of 1st command, output VCF contained all the heterozygous variants having AB between 0.25 and 0.75. But it didn't output homozygous variants. As you suggested, I also checked AB value for homozygous calls, and it was zero for GT (1/1).

2nd command produced output VCF same as INPUT VCF. No filtering was made.

Still, looking for possible solution.

Thanks!

ADD REPLY
1
Entering edit mode

Count hom and het with cutoff separately and then see if their sum is equal to 2nd command. A round about way is that filter hom only variants and filter het with cutoffs. You will have two vcfs at this points. Now merge the two vcfs with bcftools. It would help if you could post an example VCF.

ADD REPLY
0
Entering edit mode

Hi @cpad0112

Thank you so much for your help. I've found the command to do this task. :)

ADD REPLY
0
Entering edit mode

could you please post the command you found to do this task.

ADD REPLY
1
Entering edit mode

Hi @balathumma10

Following command helped me to accomplish the said task.

bcftools view -i 'AB>0.25 && AB < 0.75 | AB < 0.01' in.vcf > out.vcf
ADD REPLY
0
Entering edit mode

Hi @AISHA, just want to understand the rationale behind the above command in retaining the homozygous calls. Piping AB < 0.01 makes you retain the homozygous calls? What if I want to filter by AB in homozygous calls as well as retaining heterozygous ones?

ADD REPLY
1
Entering edit mode

Hi @aritra90

1) Rationale behind retaining homozygous calls? To catch loci that are fixed variants (all individuals are homozygous for one of the two variants)

2) Piping AB < 0.01 makes you retain the homozygous calls? YES

3) The way you filter depends upon your aim. AB is always equal to zero in case of homozygous calls. For further details you can visit this: SNP Filtering

ADD REPLY
0
Entering edit mode

Thanks a lot @AISHA! Your description and that link is really helpful!

ADD REPLY

Login before adding your answer.

Traffic: 1645 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6