Why Doesn'T Samtools/Bcftools Output Homozygous Mutations?
3
2
Entering edit mode
12.2 years ago
Leszek 4.2k

I have noticed samtools/bcftools don't report homozygous mutations. Is it possible to make it report such cases?
ie.

chrX    273982    G    125    aAAAAAaaaaaAAaAAaaaaaaaAAAaAAaaaaAAaAaaAAaaAAAAaaAAAAAAAAaaAAaaaaAAAaaAAAAAAAAAaAAAaAAAAAAAAAAAAAAAAAAAAaaAAAAAAaAAAAaAAAaAaa    FFFG=JFFFFFIJFJF<HHHHHHJDIHJJ2JJIJI:CJ?IGIHIJJIJJIJIJJJJJGJGIJJJJJJJJDJJJJJJICBJJJEJ2JJIJBJJJHJHFHHFHHGHIJHDDFFFEDDDADCCD?C8?

The command I use is:

samtools mpileup -C 50 -ugf $ref.fa $f.bam -r chrX:253982-293982 | bcftools view -bvcg - > $f.raw.bcf
bcftools view $f.raw.bcf | vcfutils.pl varFilter -D 600 > $f.flt.vcf

Samtools version: 0.1.18-dev (r982:313)

EDIT

Only setting -p 1 report homozygous mt I have been looking for (among 3251 false positives):

samtools mpileup -Bugf $ref.fa $f.bam -r chrX:253982-293982 | bcftools view -p 1.0 -vcgN - | grep "3982"
chrX    273982    .    G    A    0.00652    .    DP=125;VDB=0.0615;AF1=0.5;CI95=1.5,0;DP4=0,0,82,43;MQ=44;FQ=-30    GT:PL:GQ    0/1:255,255,255:3

Is there any other, more efficient way of getting such cases?

EDIT2
Interestingly GATK 2.0 UnifiedGenotyper predicts homozygous SNPs nicely:

java -jar ~/src/GenomeAnalysisTK-2.1-13-g1706365/GenomeAnalysisTK.jar -T UnifiedGenotyper -R $ref.fa -I $s.rg.bam -o $s.rg.vcf -ploidy 1
chrX    273982    .    G    A    5393    .    AC=1;AF=1.00;AN=1;DP=125;Dels=0.00;FS=0.000;HaplotypeScore=1.9663;MLEAC=1;MLEAF=1.00;MQ=44.00;MQ0=0;QD=43.14;SB=-1.820e+03    GT:AD:DP:GQ:MLPSAC:MLPSAF:PL    1:0,125:125:99:1:1.00:5393,0

In this context behaviour of samtools is difficult to explain. What is your opinion?

EDIT3
Still no idea, and get more samples with false negative homozygous SNPs ie.

CNBG_3826    1049    .    T    A    0.00652    .    DP=284;VDB=0.0343;AF1=0.5;CI95=1.5,0;DP4=0,0,101,158;MQ=42;FQ=-30    GT:PL:GQ    0/1:255,255,255:3
next-gen snp calling samtools • 4.7k views
ADD COMMENT
0
Entering edit mode

I am interested in those false positivies, what are they? How do they originate? Thanks.

ADD REPLY
2
Entering edit mode
12.2 years ago
matted 7.8k

I worry about the various implicit and explicit filtering steps in cases like this.

Does it show up before varFilter? If so, then you can figure out why it's failing a default check.

Try mpileup -B to disable BAQ, which might be filtering it out.

I'm actually not sure what effect mpileup -C 50 will have, so it might be worth trying this without it.

It looks like your quality scores are in Sanger format, but in the past I've had problems when I didn't convert them and forgot to use the -6 flag to mpileup. So that's something to keep in mind for other situations.

The previous suggestion of raising bcftools view -p is good to try too.

Finally, and this is more of a guess, does the behavior change based on the size of the region you're calling? That is, do you still miss the SNP in a genome- or chromosome-wide scan, versus this single base example you've shown? I'm not sure how the AFS estimation will work given only one position, so you may want to try this on a larger region (if you haven't already).

ADD COMMENT
0
Entering edit mode

it's not showing before varFilter.
not an issue of BAQ, -C.
fastq is sanger like (PHRED+33).
as a matter of fact I call snps in 40kb region (-r chrX:253982-293982), but showing only 1bp for simplicity.

ADD REPLY
1
Entering edit mode

So just to clarify, you actually tried running with -B? Can you try running a whole chromosome at a time and checking the results?

This is very strange, but samtools is so widely used (and your call above is so simple) and this is exactly what it's supposed to do. So without much data I still assume it's something strange in the way you're calling it, or some evil feature of your reads or pipeline that we don't understand yet.

Have you tried another version of samtools, maybe the latest on github, to compare?

Have you made sure that the reference matches the one used to create the BAMs (just checking, since in strange cases like this it's sometimes the obvious things that we'd all overlook at first)?

ADD REPLY
1
Entering edit mode
12.2 years ago

I am not sure if it helps but you can try using bcftools view -p 0.99 -vcgN and see if you get any homozygous mutations.

ADD COMMENT
0
Entering edit mode

tried, but setting -p to 0.99. changing -p 1 solves the issue, but report thousands false positives (look at edit)

ADD REPLY
0
Entering edit mode
12.2 years ago

Do try mpileup -B. I've seen the BAQ calculations eat sanger-verified SNPs.

mpileup -C 50 is suggested when using .bams generated with bwa, so that bit is fine.

ADD COMMENT
0
Entering edit mode

it's not an issue of BAQ. I have got high coverage (~120x)

ADD REPLY

Login before adding your answer.

Traffic: 2063 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6