Entering edit mode
6.2 years ago
tarek.mohamed
▴
370
Hi All,
I am trying to use modify bcftools mpileup / call to detect snps from nanopore sequence data with huger depth ( >5000X).
I need to understand the format of the mpileup shown bellow. I used position 84303804 as an example
> bcftools mpileup --skip-indels -C 0 -d 250 -r chr9:84273123-84368634 --threads 24 --output-type v -f NCBI.GRCh38.fa --output BC01.bcftooslmpileup_noindels.vcf BC01.bam
> cat BC01.bcftooslmpileup_noindels.vcf | grep 84303804
chr9 84303804 . C T,A,<*> 0 . DP=476;I16=73,12,0,32,1524,28372,525,8835,5090,304900,1920,115200,2125,53125,800,20000;QS=0.743777,0.243533,0.0126891,0;VDB=0.00960702;SGB=-0.69312;RPB=0.345296;MQB=0.995226;MQSB=0.992367;BQB=0.134655;MQ0F=0 PL 0,31,127,226,196,166,255,217,172,166
Hello,
bcftools mpileup
calculates the genotype likelihoods for each position. It is not doing variant calling! What you see is the position, the REF and observed ALT alleles, somempileup
specific information which are used later in variant calling (for there description look into the header of your output) and the likelihoods for each possible genotype.To make variant calling, you have to pass the output of
mpileup
tobcftools call
.fin swimmer
Hi Thanks for the reply I know about bcftools call, but I want to understand all of mpileup output because as you mentioned this will be used by bcftools call. By modifying bcftools mpileup parameters, hopefully I can call variants form the long reads sequence I have. My bam file pile (with huge depth of coverage) clearly shows the variant positions. If I ran bcftools mpileup | call using default parameters, it Ignore calling some clear variants.
For example, there are 10 numbers after the PL ( probability likelihood) and similarly there are several numbers after I16. I want to understand what are these numbers
Thanks
Hello again,
for the I16 there is a hint in the header:
If we look at this file, one can find this:
The
PL
values describes a phred scaled likelihood for the different genotypes that are possible like0/0
,0/1
,1/1
,0/2
etc. The higher the number the less likely this genotype is. But I'm unsure which position represents which genotype.fin swimmer