Question

MAPQ filtering for clinical applications

7

Entering edit mode

7.2 years ago

Kevin Blighe 88k

A discussion recently arose about how one ought to filter MAPQ in a clinical setting, i.e., where a NGS sample is being processed in order to produce a result for a patient who has an unknown or hypothesised diagnosis. The result could obviously be key.

It was suggested by a friend that MAPQ of 20 would be a sufficient cutoff, whereas, I stated that it ought to be as high as 60. Another colleague implied that my high cutoff didn't make sense because each region of the genome is covered by reads at varying MAPQ and that there would be many over each region, I assume s/he meant, that would have high MAPQ.

Keep in mind that BWA is being used, which produces MAPQ in the range 0-60. Also, I generally drop to as low as MAPQ 40 in clinical pipelines and then rely on a whole bunch of other metrics to ensure that only true variants are called, confirmed with Sanger

For the record: >50% of the genome exhibits a high level of homology and there are certain regions that will simply never attain a MAPQ >30 due to their high level of homology. Look at the CYP genes, for example. Some of the exons of these just cannot be reliably sequenced using the standard NGS protocols. Some reads do map to these highly homologous regions. For example, at MAPQ 60, you may get coverage of around 10 or 20, whereas other less homologous regions may get >1000.

Remember that this is a clinical setting where a result can change a person's life. As the analyst, would you sign your name on a clinical report, a document type that has legal weight, in knowing that you let these low MAPQ reads through?

The second issue of putting too much focus on MAPQ also arose. Of course, there are countless other QC metrics to use, but MAPQ is one of the first and therefore one of the most important. If you get it wrong, a lot of your results may end up being false-positives.

Cheers for any comments!

MAPQ clinical NGS • 4.2k views

ADD COMMENT • link updated 18 months ago by DKA ▴ 40 • written 7.2 years ago by Kevin Blighe 88k

3

Entering edit mode

"As the analyst", I wouldn't only trust the MAPQ (e.g: check mappability, clipping, GC%, poly-X, IGV viz, DEPTH, etc... ) and I would always ask for a good-old sanger sequencing to confirm any suspicious mutation.

ADD REPLY • link 7.2 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thanks very much for the reply, Pierre. I can only agree with you.

ADD REPLY • link 7.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Hi, may I ask which MAPQ score you settled on, please? thank you

ADD REPLY • link 18 months ago by DKA ▴ 40

1

Entering edit mode

In my most recent work, I use MAPQ 60 via the BWA route. However, ultimately, I don't think that it matters too much if you rely also on other metrics for filtering

ADD REPLY • link 18 months ago by Kevin Blighe 88k

0

Entering edit mode

Thank you for your response. I have read your recent work it is really impressive. Could you please provide information on any other relevant metrics we should consider other than MAPQ? Really sorry for asking too much and thank you

ADD REPLY • link 18 months ago by DKA ▴ 40

2

Entering edit mode

Thanks for the comments. For single nucleotide variants, other key ones:

QUAL: make sure that it is >= 30
Genotype Quality (GQ): make sure that it is >= 30
Read Depth:
- at total position read depth >=30, 97% of Sanger-confirmed variants can be detected via NGS
- at total position read depth >=20, it's 100% [of Sanger-confirmed variants]
- below 20, we enter dangerous false-positive territory

Variant calling shouldn't be complex, but many tools over the years have made it extremely complex, e.g., GATK, DeepVariant, etc. It remains that, with BWA and SAMtools, our clinical lab produced a clinical workflow with consistent 100% agreement with Sanger sequencing over our panel of genes of interest.

ADD REPLY • link 18 months ago by Kevin Blighe 88k

1

Entering edit mode

Thanks a lot for your comments! Your insights on the key metrics for single nucleotide variants are really helpful. Really appreciate your thorough response.

ADD REPLY • link 18 months ago by DKA ▴ 40

score 3 · Accepted Answer · 2017-09-18

Remember that this is a clinical setting where a result can change a person's life. As the analyst, would you sign your name on a clinical report, a document type that has legal weight, in knowing that you let these low MAPQ reads through?

Would you sign your name on a clinical report that said someone did not have a particular clinically relevant mutation because you removed the reads with MAPQ of 59 supporting it?

What you accept depends entirely on the sensitivity and specificity claimed in the report, and whether the entire pipeline (including the bioinformatics analysis) has been sufficiently comprehensively validated to perform at least as well as claimed. False positives and false negatives are a fact of life for any clinical diagnostic test. Would I include MAPQ=20 reads? That depends entirely on the purpose of the clinical report, and the relative "cost" of false positives and false negatives. If they definitely die if I report a false negative, but the cost of a false positive is the prescription of a cheap drug with no side effect then I would be designing my pipeline to be highly sensitive with less emphasis on precision.

To rephase your question back at you: As the analyst, would you sign your name to a clinical report in which you do not know what the false positive and false negative rates are?