Freebayes for creating VCF : meaning of QUAL
2
0
Entering edit mode
3.1 years ago
Michal Nevo ▴ 140

Hi,

I have 10 sorted Bam files 5 Male and 5 Female. I used Freebayes to create vcf files separately and then merged all 10 together. (ploidy 8)

The commands I used:

freebayes -f $REF -p 8 $SORTED_BAM > $OUTPUT (for each one)
bcftools merge $M1 $M3 $M5 $M7 $M9 $F1 $F2 $F4 $F6 $F8 -o $output_file

Here are some sample lines from my merged vcf file:

Looking at one snip, how could I know how many females and how many males have evidence for this snip? I wonder what parameter may help me with that.. Why is the QUAL 42.3447 at position 53, but the snip appears in only 2 of the 10 samples? And how is it that NS is 1 and not 10?

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  M1.sorted.bam   M3.sorted.bam   M5.sorted.bam   M7.sorted.bam M9.sorted.bam   F1.sorted.bam   F2.sorted.bam   F4.sorted.bam   F6.sorted.bam   F8.sorted.bam
NC_048323.1     30      .       TG      T       9.93635 .       DPB=1.33333;EPPR=0;GTI=0;MQMR=0;NS=1;NUMALT=1;ODDS=1.07523;PAIREDR=0;PQR=0;PRO=0;QR=0;RO=0;RPPR=0;SRF=0;SRP=0;SRR=0;DP=6;AB=0;ABP=0;AF=1;AO=2;CIGAR=1M1D1M;DPRA=0;EPP=7.35324;LEN=1;MEANALT=1;MQM=12;PAIRED=1;PAO=0;PQA=0;QA=72;RPL=0;RPP=7.35324;RPR=2;RUN=1;SAF=2;SAP=7.35324;SAR=0;TYPE=del;AN=24;AC=24  GT:DP:AD:RO:QR           :AO:QA  ./././././././.:.:.:.:.:.:.     ./././././././.:.:.:.:.:.:.     ./././././././.:.:.:.:.:.:.     ./././././././.:.:.:.:           .:.:.   ./././././././.:.:.:.:.:.:.     ./././././././.:.:.:.:.:.:.     ./././././././.:.:.:.:.:.:.     1/1/1/1/1/1/1/1:2:0,2:           0:0:2:70        1/1/1/1/1/1/1/1:2:0,2:0:0:2:49  1/1/1/1/1/1/1/1:2:0,2:0:0:2:72
NC_048323.1     36      .       T       C       9.89173 .       DPB=2;EPPR=0;GTI=0;MQMR=0;NS=1;NUMALT=1;ODDS=2.16491;PAIREDR=0;PQR=0;PRO=0;QR=0;RO=0;RPPR=0;SRF=0;SRP=0;SRR=0;DP=2;AB=0;ABP=0;AF=1;AO=2;CIGAR=1X;DPRA=0;EPP=3.0103;LEN=1;MEANALT=1;MQM=12;PAIRED=1;PAO=0;PQA=0;QA=53;RPL=0;RPP=7.35324;RPR=2;RUN=1;SAF=1;SAP=3.0103;SAR=1;TYPE=snp;AN=4;AC=4   GT:DP:AD:RO:QR:AO:QA .           /././.:.:.:.:.:.:.      ./././.:.:.:.:.:.:.     1/1/1/1:2:0,2:0:0:2:53  ./././.:.:.:.:.:.:.     ./././.:.:.:.:.:.:.     ./././           .:.:.:.:.:.:.   ./././.:.:.:.:.:.:.     ./././.:.:.:.:.:.:.     ./././.:.:.:.:.:.:.     ./././.:.:.:.:.:.:.
NC_048323.1     53      .       C       T       42.3447 .       DPB=3;EPPR=0;GTI=0;MQMR=0;NS=1;NUMALT=1;ODDS=3.03842;PAIREDR=0;PQR=0;PRO=0;QR=0;RO=0;RPPR=0;SRF=0;SRP=0;SRR=0;DP=7;AB=0;ABP=0;AF=1;AO=3;CIGAR=1X;DPRA=0;EPP=9.52472;LEN=1;MEANALT=1;MQM=23;PAIRED=1;PAO=0;PQA=0;QA=109;RPL=0;RPP=9.52472;RPR=3;RUN=1;SAF=3;SAP=9.52472;SAR=0;TYPE=snp;AN=12;AC=12   GT:DP:AD:RO:QR:AO:QA .           /././././././.:.:.:.:.:.:.      ./././././././.:.:.:.:.:.:.     1/1/1/1:3:0,3:0:0:3:104 ./././././././.:.:.:.:.:.:.     ./././           ././././.:.:.:.:.:.:.   1/1/1/1/1/1/1/1:4:1,3:1:26:3:109        ./././././././.:.:.:.:.:.:.     ./././././././.:.:.:.:.:.:.  .           /././././././.:.:.:.:.:.:.      ./././././././.:.:.:.:.:.:.
VCF results Freebayes QUAL • 2.0k views
ADD COMMENT
2
Entering edit mode
3.1 years ago
GokalpC ▴ 100

Instead of calling each sample seperately I would suggest you to try to call all samples at the same time if the numbers are not unreasonable (less than 30 I suppose). This way hom-ref, het and hom-alt samples will be presented properly in the VCF and AF can be predicted better.

ADD COMMENT
1
Entering edit mode
3.1 years ago

Read up on the VCF format.

Each of your samples has numbers in various formats: GT:DP:AD:RO:QR:AO:QA

For example, DP is the read depth. The number of 1/1 indicates a homozygous genotype. Thus for that column that has 1/1:2 you have two reads supporting that variant.

Now eyeballing these values is almost impossible. Investigate the features of bcftools and learn to transform your VCF into a tabular file that contains the information you need.

Now the exact details of how QUAL is computed will be probably hard to track down, usually, we don't bother that much with the exact value, 30 vs 35 rather it is used as a filter to get rid of evidently weak calls. Read the freebayes paper, perhaps they are more forthcoming. As for NS, lots of things can go bad, make you the samples are named (tagged) etc.

ADD COMMENT

Login before adding your answer.

Traffic: 2375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6