Question

Trust in BQSR for AVITI

0

Entering edit mode

22 months ago

Ido Tamir 5.2k

How much can I trust the BQSR quality values? I aligned PE150 data with BWA-MEM (with and without IndelRealigner), removed duplicates (PICARD). Then I followed this invocation regarding know-sites. I used this on AVITI data for a technical comparison and to check if they quality is really as good as advertised. Now they seem to max out at ~Q31 instead of the nominal 32-45. Up to Q30 its just a difference of 1 i.e. reported Q28 is empirical Q27. Is here anybody else that has data in this direction (besides the AVITI home page)?

report

thank you very much, ido

aviti bqsr • 1.5k views

ADD COMMENT • link 22 months ago by Ido Tamir 5.2k

1

Entering edit mode

Can you try this as an alternate? From BBTools.

calctruequality.sh

Written by Brian Bushnell Last modified March 21, 2019

Description: Calculates observed quality scores from mapped sam/bam files. Generates matrices for use in recalibrating quality scores. By default, the matrices are written to /ref/qual/ in the current directory.

It can then be used to recalibrate data.

ADD REPLY • link 22 months ago by GenoMax 151k

score 1 · Answer 1 · 2023-07-14

1

Entering edit mode

22 months ago

LChart 4.9k

The important part of BQSR isn't the average difference but the corrections for read position and preceding base sequence. What do those marginal distributions look like? (Change in quality by read depth, say)

ADD COMMENT • link 22 months ago by LChart 4.9k

0

Entering edit mode

We are evaluating AVITI as a new sequencing platform. If it really provides >Q40 compared to Q30 then we could sequence more shallowly to call variants, identify more subpopulations at the same read depth etc.... So it would be cost-saving for our customers even if per read cost is identical to Illumina. This is why I think the absolute values I get out are important (not the difference, or marginal distributions). This is also what the marketing hype is about.

The BSQR curve that is shown looks the same for 10M, 50M, 100M and 200M reads. I don't understand what additional information I get from checking marginal distributions if the corrected maximum value is 31. It tells me that at the same read depth, I can expect the same accuracy for calling SNPs as with Illumina (of course the error profile might differ a bit). Which is fine, but it's not what is advertised.

ADD REPLY • link 22 months ago by Ido Tamir 5.2k

1

Entering edit mode

I don't understand what additional information I get from checking marginal distributions if the corrected maximum value is 31

I certainly think there is value in establishing that biases present in Illumina are not present in AVITI sequencing. But note that the limitation on sequencing depth, at least for ploidy>1 genomes, is based largely on the binomial probability of missing alleles, and not the error coming out of the sequencing machine.

You can go ahead and validate the BQSR by retaining only reads with no mismatches in the CIGAR string or NM:0, which will give you confidence that the method can produce Q40+ recalibrated values.

ADD REPLY • link 22 months ago by LChart 4.9k