MAPQ filtering for clinical applications
1
7
Entering edit mode
7.2 years ago

A discussion recently arose about how one ought to filter MAPQ in a clinical setting, i.e., where a NGS sample is being processed in order to produce a result for a patient who has an unknown or hypothesised diagnosis. The result could obviously be key.

It was suggested by a friend that MAPQ of 20 would be a sufficient cutoff, whereas, I stated that it ought to be as high as 60. Another colleague implied that my high cutoff didn't make sense because each region of the genome is covered by reads at varying MAPQ and that there would be many over each region, I assume s/he meant, that would have high MAPQ.

Keep in mind that BWA is being used, which produces MAPQ in the range 0-60. Also, I generally drop to as low as MAPQ 40 in clinical pipelines and then rely on a whole bunch of other metrics to ensure that only true variants are called, confirmed with Sanger

For the record: >50% of the genome exhibits a high level of homology and there are certain regions that will simply never attain a MAPQ >30 due to their high level of homology. Look at the CYP genes, for example. Some of the exons of these just cannot be reliably sequenced using the standard NGS protocols. Some reads do map to these highly homologous regions. For example, at MAPQ 60, you may get coverage of around 10 or 20, whereas other less homologous regions may get >1000.

Remember that this is a clinical setting where a result can change a person's life. As the analyst, would you sign your name on a clinical report, a document type that has legal weight, in knowing that you let these low MAPQ reads through?

The second issue of putting too much focus on MAPQ also arose. Of course, there are countless other QC metrics to use, but MAPQ is one of the first and therefore one of the most important. If you get it wrong, a lot of your results may end up being false-positives.

Cheers for any comments!

MAPQ clinical NGS • 4.2k views
ADD COMMENT
3
Entering edit mode

"As the analyst", I wouldn't only trust the MAPQ (e.g: check mappability, clipping, GC%, poly-X, IGV viz, DEPTH, etc... ) and I would always ask for a good-old sanger sequencing to confirm any suspicious mutation.

ADD REPLY
0
Entering edit mode

Thanks very much for the reply, Pierre. I can only agree with you.

ADD REPLY
0
Entering edit mode

Hi, may I ask which MAPQ score you settled on, please? thank you

ADD REPLY
1
Entering edit mode

In my most recent work, I use MAPQ 60 via the BWA route. However, ultimately, I don't think that it matters too much if you rely also on other metrics for filtering

ADD REPLY
0
Entering edit mode

Thank you for your response. I have read your recent work it is really impressive. Could you please provide information on any other relevant metrics we should consider other than MAPQ? Really sorry for asking too much and thank you

ADD REPLY
2
Entering edit mode

Thanks for the comments. For single nucleotide variants, other key ones:

  • QUAL: make sure that it is >= 30
  • Genotype Quality (GQ): make sure that it is >= 30
  • Read Depth:
    • at total position read depth >=30, 97% of Sanger-confirmed variants can be detected via NGS
    • at total position read depth >=20, it's 100% [of Sanger-confirmed variants]
    • below 20, we enter dangerous false-positive territory

Variant calling shouldn't be complex, but many tools over the years have made it extremely complex, e.g., GATK, DeepVariant, etc. It remains that, with BWA and SAMtools, our clinical lab produced a clinical workflow with consistent 100% agreement with Sanger sequencing over our panel of genes of interest.

ADD REPLY
1
Entering edit mode

Thanks a lot for your comments! Your insights on the key metrics for single nucleotide variants are really helpful. Really appreciate your thorough response.

ADD REPLY
3
Entering edit mode
7.2 years ago
d-cameron ★ 2.9k

Remember that this is a clinical setting where a result can change a person's life. As the analyst, would you sign your name on a clinical report, a document type that has legal weight, in knowing that you let these low MAPQ reads through?

Would you sign your name on a clinical report that said someone did not have a particular clinically relevant mutation because you removed the reads with MAPQ of 59 supporting it?

What you accept depends entirely on the sensitivity and specificity claimed in the report, and whether the entire pipeline (including the bioinformatics analysis) has been sufficiently comprehensively validated to perform at least as well as claimed. False positives and false negatives are a fact of life for any clinical diagnostic test. Would I include MAPQ=20 reads? That depends entirely on the purpose of the clinical report, and the relative "cost" of false positives and false negatives. If they definitely die if I report a false negative, but the cost of a false positive is the prescription of a cheap drug with no side effect then I would be designing my pipeline to be highly sensitive with less emphasis on precision.

To rephase your question back at you: As the analyst, would you sign your name to a clinical report in which you do not know what the false positive and false negative rates are?

ADD COMMENT
1
Entering edit mode

Great answer d-cameron. I do typically drop to MAPQ of 40 or 50. From my experience, if you leave it lower (20 or 30), variants around hompolymer regions begin to appear in the final report and the culprit is the low MAPQ. These may be homopolymers of just a few nucleotides in length, e.g. 6.

Your implication is spot-on, though, i.e., false-positives and false-negatives are equally important, and that the test/report is only as good as the quoted sensitivity/specificity (as compared to the accepted gold standard).

ADD REPLY

Login before adding your answer.

Traffic: 2768 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6