Question

Calculate the Quality of Bases

0

Entering edit mode

7.2 years ago

manojkumarbioinfo ▴ 100

Hi,

I want to calculate the quality of base using a bed file.

for example i have aligned my raw reads and i want to calculate the quality of base found in chromosome 1 at particular position. is there any tools for calculating the quality of user desired bases. if yes kindly suggest me the best tool

next-gen alignment sequencing • 4.6k views

ADD COMMENT • link updated 7.2 years ago by prasundutta87 ▴ 670 • written 7.2 years ago by manojkumarbioinfo ▴ 100

0

Entering edit mode

Hello,

what is the reason for that? Commonly is to have a look on the average base quality for the base position within the reads.

fin swimmer

ADD REPLY • link 7.2 years ago by finswimmer 16k

0

Entering edit mode

i just want to filter the variants based on the quality of the base and their depth.

ADD REPLY • link 7.2 years ago by manojkumarbioinfo ▴ 100

0

Entering edit mode

Are you writing your own variant caller? Otherwise this is something that variant caller normally consider about.

Or have you read something about "Filtering variants by their quality per depth". Than it's more likely that you want to take the QUAL value in your vcf file and divide it by the depth. The QUAL value in the vcf file have nothing to do with average base quality (of course this is one thing that have influence on it).

fin swimmer

ADD REPLY • link 7.2 years ago by finswimmer 16k

0

Entering edit mode

i'm not writing my own variant caller. i'm bench marking the available tools. for that i'm using my own sequence where i know the exact variant. but available tools are missing certain variants so i wish to find out the reason for missing variants. i checked the depth of those region it looks good but i want to know the quality of those position too. that why i want to check the quality of particular region. if i have 10 variants i can check it manually by i have around 80 variants. the depth of those region i calculated using GATK with my bed file and aligned reads if it is possible to calculate the quality like that it will be easy .

ADD REPLY • link 7.2 years ago by manojkumarbioinfo ▴ 100

0

Entering edit mode

Most of the variant callers have their own algorithm for variant calling and they will never have the exact same number of variants...if you want to use multiple variant callers, the standard practice is to use common variants for your downstream analysis..you also get different results because the default parameters for each variant caller is different, for example the maximum depth considered for variant calling for bcftools/samtools and GATK is different leading to difference in no. of variants..furthermore, the internal default variant filtering also happens differently for different tools..there are plenty of variant caller comparison studies out there which may be looked into..

ADD REPLY • link 7.2 years ago by prasundutta87 ▴ 670

0

Entering edit mode

i have used 5 different variant callers and merged every out put them i compared with my known ones, from that i missed about 86 variants which is not detected by any of my variant callers. so i wish to find whats the reason for missing those region. i checked with the depth, of those region it seems they have average depth of 80x in those regions. now i want to check the quality of those bases so. so that i can find the reason why i the program cant find the variants.

ADD REPLY • link 7.2 years ago by manojkumarbioinfo ▴ 100

0

Entering edit mode

From where did you get those known variants? Are they verified?

I can suggest another thing..try uploading your data in IGV and look at the reads in those position. They will also have their the base quality mentioned if you take your mouse over to the bases..furthermore, are those regions near to an indel? bcftools/samtools in general won't call variants from those position as their original base qualities will become lower owing to the calculation of BAQ (base alignment quality score), which may become lower than your default base quality filtering parameter..in GATK, haplotypecaller will perform read realignment which may lead to not calling variants from those positions..

ADD REPLY • link 7.2 years ago by prasundutta87 ▴ 670

0

Entering edit mode

ya that what i'm doing. but i want to know if there is any easier way than this. say if i have a 1000 variants like that we cant able to check them manually.

ADD REPLY • link 7.2 years ago by manojkumarbioinfo ▴ 100

0

Entering edit mode

Check out the pysam module I wrote about in your post then, they utilize the pileup file to do the kind of job you want to do..

ADD REPLY • link 7.2 years ago by prasundutta87 ▴ 670

0

Entering edit mode

sure i will look into it.

ADD REPLY • link 7.2 years ago by manojkumarbioinfo ▴ 100

0

Entering edit mode

As prasundutta87 asks how do you now that your variants are true? Did you sanger sequence it?

I still have no answer to your original question. But it don't think that this will not help as there are so much reasons beside the base quality why a variant is not called, e.g. mapping quality, fraction of reads that have the variant, position within the read, ratio of forward/reverse reads that with variants, ...

So maybe, if you tell us whether your variants are verified and how nad can post an example were one variant caller found a variant and which variant caller don't, we can find a better approach.

fin swimmer

ADD REPLY • link 7.2 years ago by finswimmer 16k

score 0 · Answer 1 · 2017-08-28

You don't calculate base quality scores, it has been already assigned by the sequencing machine..and it will be present in your SAM/BAM file. Although, you do re-calculate them from the original base quality score, such as BAQ in mpileup from samtools..

For extracting the quality of just one base, check this out: http://pysam.readthedocs.io/en/latest/api.html?highlight=quality%20score