bcftools mpileup output format
0
2
Entering edit mode
6.2 years ago
tarek.mohamed ▴ 370

Hi All,

I am trying to use modify bcftools mpileup / call to detect snps from nanopore sequence data with huger depth ( >5000X).

I need to understand the format of the mpileup shown bellow. I used position 84303804 as an example

> bcftools mpileup --skip-indels  -C 0 -d 250 -r chr9:84273123-84368634 --threads 24 --output-type v -f NCBI.GRCh38.fa --output BC01.bcftooslmpileup_noindels.vcf BC01.bam


> cat BC01.bcftooslmpileup_noindels.vcf | grep 84303804
chr9    84303804    .   C   T,A,<*> 0   .   DP=476;I16=73,12,0,32,1524,28372,525,8835,5090,304900,1920,115200,2125,53125,800,20000;QS=0.743777,0.243533,0.0126891,0;VDB=0.00960702;SGB=-0.69312;RPB=0.345296;MQB=0.995226;MQSB=0.992367;BQB=0.134655;MQ0F=0 PL  0,31,127,226,196,166,255,217,172,166
bcftools mpileup • 11k views
ADD COMMENT
0
Entering edit mode

Hello,

bcftools mpileup calculates the genotype likelihoods for each position. It is not doing variant calling! What you see is the position, the REF and observed ALT alleles, some mpileup specific information which are used later in variant calling (for there description look into the header of your output) and the likelihoods for each possible genotype.

To make variant calling, you have to pass the output of mpileup to bcftools call.

$ bcftools mpileup --skip-indels  -C 0 -d 250 -r chr9:84273123-84368634 --threads 24 -Ou -f NCBI.GRCh38.fa BC01.bam| bcftools call -mv -Ov -o BC01.bcftooslmpileup_noindels.vcf

fin swimmer

ADD REPLY
0
Entering edit mode

Hi Thanks for the reply I know about bcftools call, but I want to understand all of mpileup output because as you mentioned this will be used by bcftools call. By modifying bcftools mpileup parameters, hopefully I can call variants form the long reads sequence I have. My bam file pile (with huge depth of coverage) clearly shows the variant positions. If I ran bcftools mpileup | call using default parameters, it Ignore calling some clear variants.

For example, there are 10 numbers after the PL ( probability likelihood) and similarly there are several numbers after I16. I want to understand what are these numbers

Thanks

ADD REPLY
1
Entering edit mode

Hello again,

for the I16 there is a hint in the header:

##INFO=<ID=I16,Number=16,Type=Float,Description="Auxiliary tag used for calling, see description of bcf_callret1_t in bam2bcf.h">

If we look at this file, one can find this:

  // The fields are:
    //      depth fwd   .. ref (0) and non-ref (2)
    //      depth rev   .. ref (1) and non-ref (3)
    //      baseQ       .. ref (4) and non-ref (6)
    //      baseQ^2     .. ref (5) and non-ref (7)
    //      mapQ        .. ref (8) and non-ref (10)
    //      mapQ^2      .. ref (9) and non-ref (11)
    //      minDist     .. ref (12) and non-ref (14)
    //      minDist^2   .. ref (13) and non-ref (15)

The PL values describes a phred scaled likelihood for the different genotypes that are possible like 0/0, 0/1, 1/1, 0/2 etc. The higher the number the less likely this genotype is. But I'm unsure which position represents which genotype.

fin swimmer

ADD REPLY

Login before adding your answer.

Traffic: 1729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6