Filtering for homozygous reference alleles
0
0
Entering edit mode
5.2 years ago
AGE ▴ 30

I would like to obtain a gvcf file for non-variant sites with a minimum coverage of 5x (homozygous reference alleles that I can at least minimally trust). The non-variant sites i am getting seem to be of low coverage. For example a non-variant site with DP=51 but DP4=1,0,0,0. The bam file was filtered with samtools for a minimum mapping quality of 20 (-q 20), but the discrepancy seems to be too high.

E.g.

Here is my output for homozygous ref sites

 bcftools  mpileup -f genomic.fa file.bam -a "AD,ADF,ADR,DP,SP,INFO/AD,INFO/ADF,INFO/ADR" | bcftools call --gvcf 5 -m > gvcf.vcf && awk '$5 == "." {print $0}' gvcf.vcf | grep -v "##" | shuf -n 25

    NW_006532855.1  244614  .       T       .       .       .       END=244634;MinDP=31     GT:DP   0/0:31
    NW_006533971.1  93708   .       A       .       29.5864 .       DP=21;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.     GT:DP:SP:ADF:ADR:AD     ./.:0:0:0:0:0
    NW_006536084.1  51107   .       G       .       67      .       DP=1;ADF=0;ADR=1;AD=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=60;MinDP=1     GT:DP:SP:ADF:ADR:AD     0/0:1:0:0:1:1
    NW_006532401.1  390421  .       T       .       45.5868 .       DP=2;ADF=0;ADR=1;AD=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=60;MinDP=1     GT:DP:SP:ADF:ADR:AD     0/0:1:0:0:1:1
    NW_006535812.1  52470   .       A       .       .       .       END=52587;MinDP=55      GT:DP   0/0:55
    NW_006536288.1  53      .       T       .       29.5864 .       DP=11;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.     GT:DP:SP:ADF:ADR:AD     ./.:0:0:0:0:0
    NW_006532139.1  349467  .       T       .       .       .       END=349484;MinDP=39     GT:DP   0/0:39
    NW_006534925.1  53705   .       T       .       142     .       DP=3;ADF=1;ADR=2;AD=3;MQSB=1;MQ0F=0;AN=2;DP4=1,2,0,0;MQ=60;MinDP=3      GT:DP:SP:ADF:ADR:AD     0/0:3:0:1:2:3
    NW_006532792.1  285159  .       A       .       .       .       END=285176;MinDP=53     GT:DP   0/0:53
    NW_006535919.1  76893   .       T       .       .       .       END=76931;MinDP=55      GT:DP   0/0:55
    NW_006533107.1  102827  .       A       .       55      .       DP=51;ADF=1;ADR=0;AD=1;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=60;MinDP=1    GT:DP:SP:ADF:ADR:AD     0/0:1:0:1:0:1
    NW_006535205.1  85235   .       A       .       29.5864 .       DP=0;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.      GT:DP:SP:ADF:ADR:AD     ./.:0:0:0:0:0
    NW_006532576.1  44906   .       T       .       .       .       END=44910;MinDP=8       GT:DP   0/0:8
    NW_006535228.1  57467   .       C       .       90      .       DP=2;ADF=1;ADR=0;AD=1;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=60;MinDP=1     GT:DP:SP:ADF:ADR:AD     0/0:1:0:1:0:1
    NW_006535321.1  94195   .       G       .       .       .       END=94209;MinDP=64      GT:DP   0/0:64
    NW_006532278.1  304697  .       A       .       .       .       END=304714;MinDP=62     GT:DP   0/0:62
    NW_006532169.1  456198  .       C       .       51      .       DP=1;ADF=0;ADR=1;AD=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=21;MinDP=1     GT:DP:SP:ADF:ADR:AD     0/0:1:0:0:1:1
    NW_006533733.1  85892   .       G       .       29.5864 .       DP=0;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.      GT:DP:SP:ADF:ADR:AD     ./.:0:0:0:0:0
    NW_006534011.1  208144  .       A       .       .       .       END=208164;MinDP=68     GT:DP   0/0:68
    NW_006535039.1  100639  .       T       .       66      .       DP=48;ADF=1;ADR=0;AD=1;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=60;MinDP=1    GT:DP:SP:ADF:ADR:AD     0/0:1:0:1:0:1
    NW_006533379.1  65322   .       A       .       .       .       END=65390;MinDP=30      GT:DP   0/0:30
    NW_006533431.1  132332  .       G       .       29.5864 .       DP=1;ADF=0;ADR=0;AD=0;MQ0F=0;AN=0;DP4=0,0,0,0;MQ=.      GT:DP:SP:ADF:ADR:AD     ./.:0:0:0:0:0
    NW_006536654.1  66636   .       C       .       .       .       END=66638;MinDP=15      GT:DP   0/0:15
    NW_006532666.1  234796  .       T       .       .       .       END=234798;MinDP=48     GT:DP   0/0:48
    NW_006533913.1  124175  .       A       .       51      .       DP=1;ADF=0;ADR=1;AD=1;MQ0F=0;AN=2;DP4=0,1,0,0;MQ=44;MinDP=1     GT:DP:SP:ADF:ADR:AD     0/0:1:0:0:1:1

The AD also does not go above 4. eg:

awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=1" | wc -l
5689457
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=2" | wc -l
4036127
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=3" | wc -l
3455244
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=4" | wc -l
3082479
awk '$5 == "." {print $0}' gvcf.vcf | grep ",0,0;" | grep "AD=5" | wc -l
0

Is it the --gvcf 5 the reason why the coverage is low?

SNP sequence genome • 1.1k views
ADD COMMENT
0
Entering edit mode

Hello,

a drop-off of the DP4 value is usually happens due to low mapping or base quality. If you are running bcftools call without any additional parameter the default for mapping quality is 0 and for base quality is 13. So I would guess in your case you have bad base qualities at the given positions. Check it by viewing some with the igv browser.

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 1909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6