Hi Guys,
I have been researching fair bit over the last few days about SNPs/InDels calling, My project pretty much exactly summarised here. My NGS data has also been generated by cripsr/cas9 method.
I used this method
HaplotypeCaller -R genome.fa -I sorted.bam -stand_call_conf 30 -stand_emit_conf 10 -o sorted-out.vcf -L chrX --downsampling_type NONE
and it sort of worked. I do get some InDels picked up. I want to call the same InDels with samtools mpileup. Two reasons:
- I feel I'm not understanding how InDels/SNP being called in general
- The actual project is a bit more involved. we are interested in both SNP and InDels some of which will only be found in fewer reads.
This is what I did to try replicate HyplotypeCaller witih samtools mpileup
samtools mpileup -g -f genome.fa sorted.bam > sorted-mpileup.bcf
bcftools call -c -V snps sorted-mpileup.bcf | less
I get no InDels found however I do know that there is an InDel from both IGV and GATK HaplotypeCaller. And I also generate pile file and looked into it and it shows me that I do have and InDel...so what is going on...?
Also can anyone reiterate here why mpileup has max per-file depth set to 8000 and how does this effects my SNP/InDel calling given that I have ~ 400,000 reads in my bam files?
Also also when I do
samtool mpileup -g -f genome.fa sorted.bam > sorted-mpileup..bcf
I get DP tag included as a default and given that I have more than 8000 reads in my bam file DP tag for all SNP is 7999. But the question is what is the different when I include -t DP,DPR option during my mpileup..? because I end up with two DP tags in my vcf file with slightly different values..?
(this is really a testRun data and it is an amplicon data however in future I'll be calling SNP/InDel from genome wide NGS data)
Note please assume that I'm complete noob in variant calling and provide as much explanation as you can. much appreciate your time in advance
I understand this is really a few different questions in one, but I believe this will be a good recourse for others to refer to given that SNP/InDels calling is common topic and I couldn't find any good resources that would give a good guide for SNP/InDel calling hence me trying to include as much description about variant calling as I can and hope that others would to through comments and answers.
Cheers
I just want to add a bit more info to this question. The main problem is that I can see an InDel in mpileup format. i.e
but when I run above line with an option `-b` meaning convert to bcf format I loose that InDel..
Can anyone at least explain what is happening there..?
Cheers
Kirill, Hi! just very quickly... Did you find any reasonable explanation for the disappearing indels issue? I could not find any other post which would describe the problem in such detail! This would be great if you could share any piece of information you might have found to solve it. Many thanks!
@gajkun I pretty much stopped (never really got to use) samtools mpile up. Mainly because it was slow and I found another (easier in my view) solution. Evgeniia (below) gave very good anwser on how to use samtools mpileup with bcftools. I don't do a lot of varinat calling, but when I do I mostly use freebayes https://github.com/ekg/freebayes