bcftools get allele abundance
2
0
Entering edit mode
18 months ago
Fernando • 0

I'm using bcftools to extract variants from a bam file, but I have reference data that tells me whether the patient is homozygous or heterozygous. For a particular sample, I see a high proportion of the alternate allele (87%) and a lower proportion of the reference allele (13%), yet according to the reference data, this sample is heterozygous. When I pass this data to bcftools, it reports the presence of the allele but as if it were homozygous. Is there a way to simply get the abundance of the alternate and reference alleles? This is the code I am using

bcftools mpileup -Ou -f hg19.fasta IonXpress_006.bam --max-depth 10000 --threads 15| bcftools call -mv -Ov -o calls.vcf -v --threads 15 -P 0.1

And the results I'm getting

chr6    31239407    .   G   T   212.999 .   DP=297;VDB=0;SGB=-0.693147;RPB=0.910997;MQB=0.0455166;MQSB=6.28864e-17;BQB=0.986643;MQ0F=0;AF1=1;AC1=2;DP4=25,12,180,76;MQ=33;FQ=-229.988;PV4=0.705928,1,0.142598,0.270096  GT:PL   1/1:246,203,0
chr6    31239417    .   C   T   187.999 .   DP=296;VDB=0;SGB=-0.693147;RPB=0.933514;MQB=0.0575658;MQSB=9.14302e-18;BQB=0.0934661;MQ0F=0;AF1=1;AC1=2;DP4=24,13,182,76;MQ=33;FQ=-231.988;PV4=0.565766,0.202167,0.235437,1 GT:PL   1/1:221,205,0
chr6    31324512    .   C   A   3.14828 .
variant-calling bcftools • 1.1k views
ADD COMMENT
0
Entering edit mode
18 months ago

. Is there a way to simply get the abundance of the alternate and reference allele

add annotation -a "FORMAT/AD" to bcftools mpileup

Allelic depth (Number=R,Type=Integer)

ADD COMMENT
0
Entering edit mode
18 months ago
Sasha ▴ 850

It seems like you want to extract the allele frequencies for each variant site from the VCF file generated by bcftools. You can achieve this by using bcftools query to extract the relevant information from the VCF file. In your case, you can extract the DP4 field, which contains the number of reference and alternate allele observations in the format: REF_FWD, REF_REV, ALT_FWD, ALT_REV. Here's how you can modify your pipeline to include the extraction of allele frequencies:

  1. First, run your existing pipeline to generate the VCF file:

bcftools mpileup -Ou -f hg19.fasta IonXpress_006.bam --max-depth 10000 --threads 15 | bcftools call -mv -Ov -o calls.vcf -v --threads 15 -P 0.1

  1. Next, use bcftools query to extract the DP4 field and calculate the allele frequencies:

bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%DP4\n' calls.vcf | awk '{ref_count=$5+$6; alt_count=$7+$8; total_count=ref_count+alt_count; ref_freq=ref_count/total_count; alt_freq=alt_count/total_count; printf("%s\t%d\t%s\t%s\t%.2f\t%.2f\n", $1, $2, $3, $4, ref_freq, alt_freq)}' > allele_frequencies.tsv

This command will generate a tab-separated file allele_frequencies.tsv with the following columns: CHROM, POS, REF, ALT, REF_FREQ, ALT_FREQ. Now you can inspect the allele_frequencies.tsv file to see the abundance of the alternate and reference alleles for each variant site.

I'm using my chatbot here (https://tinybio.cloud) to help generate this answer. You can download it from the website.

ADD COMMENT

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6