Hey again,
You are posting some very topical and interesting questions.
How are you gauging 'benign' and 'pathogenic' (and the other categories) here? The precise terminology makes me believe that you're referring to ClinVar, which is a curated resource and based on published work, and therefore more reliable than other sources.
For all of your variants, in my opinion, you need concrete evidence that the variant has a statistically significant association with a particular phenotype or disease. This comes from sifting through the published literature and deciding if the study is genuine or not. If there is no concrete evidence, you should not use 'wooly' (i.e. vague) language on your clinical reports when reporting these variants. If there is an association, you should include it in the report.
For variants of unknown significance (VUS), many resort to in silico tools in an attempt to better understand their effects, but I have a difficult time in believing these. There are up to 60 in silico tools out there, at last check, and they do produce different results. Always rely on published literature of 'properly' conducted studies.
You do not have to confirm all of your variants by Sanger for each run, but you do have to validate your analysis pipeline against the gold standard, which is Sanger. Once you validate it, you will begin to build up confidence on many of the variant calls in your regions of interest, and in the future may only wish to Sanger-confirm new / previously-unseen variants that are encountered.
The stark reality is that we don't have a clue what the vast majority of variants are doing. For example, there are many examples of where intergenic variants prove more pathogenic than non-synonymous / missense variants, because they may lie in an enhancer or promoter region and thus affect transcription indirectly, or they may introduce novel transcription factor binding sites. Synonymous variants can neither be completely dismissed.
So, to your question:
Should the carrier screening test be restricted to a list of
clinically proven pathogenic variants?
It does not have to be restricted to just these variants, but you cannot use misleading language on your reports for those variants for which evidence pertaining to their pathogenicity is lacking. Ideally, you should present the 'proven' variants up front (i.e., on the first page), and then include others in the addendum and clearly state that their relationship to the endpoint is unknown/uncertain.
One can either make the moral decision to be honest in one's reports, or one can take the immoral route and overstate the power of their test without any concrete evidence to this.
Kevin
ACMG recommends confirmation testing for all NGS-reported variants, so to keep it right, Sanger confirmation is inevitable:
"FP rates for most NGS platforms in current use are appreciable, and therefore it is recommended that all disease-focused and/or diagnostic testing include confirmation of the final result using a companion technology. ... Sanger sequencing is most often employed as the orthogonal technology for germline nuclear DNA testing".
Nevertheless, the possibility to minimize the need for variant confirmation by Sanger has been researched since then. To limit Sanger sequencing, "a quality threshold of a minimal read depth coverage of >100 and a variant allele frequency (or heterozygous ratio) of >40%" could be used. BTW, we use primer-based target enrichment, for which "false-positives can be more problematic...than in probe-base enrichment due to the inability to remove PCR duplicates in the resulting data", so I'm not sure if such a threshold is applicable in our case...
Getting to the point, my question is: in case of carrier screening (that is performed for overall healthy individuals) is it sensible to report sequence variants that are only expected to cause the disease or should we stick to the already known variants that are recognized as being disease causing?
I recognize that the question is clinical, rather than bioinformatical. If someone could advise an appropriate place to post such questions - go ahead.
Thanks!
Hi again my friend,
I think that the question is fine here, but as always I would get as many opinions from independent resources as you possibly can. I have worked as Lead Bioinformatician in a clinical genetics laboratory in the UK National Health Service, and also in Italy and Brazil.
Well, I'm not in total agreement with the ACMG when they say that a minimal depth of coverage >100 is required. Getting that large read-depth is desired, but you would have to sub-sample your final aligned BAM in order to 're-shuffle' the reads, and then call variants independently on each subset. From our work in the UK genetics testing community, higher read-depth does not equate to higher confidence of variant calls. You only need ~30 read-depth at each position at which you're calling a variant. Higher read-depth actually increases the probability that you'll make a false-negative call. The minimum recommended read-depth in the UK is 18. This is for germline variants.
Regarding PCR duplicates, you will just have to do some validation on the first samples that are sequenced in your lab. It is possible to recognise and 'expunge' (remove) reads that are assumed to be PCR or optical duplicates (using a program called Picard), but —yes— depending on the sequencing method, performing this step may result in 99% of your reads disappearing.
Your question depends on the definition of 'expected':
Unfortunately, you have to do the groundwork in order to judge whether these variants warrant inclusion in your tests and reports or not. Think of this question: on which type of report would you feel comfortable signing your name? This is what clinical geneticists face each day when they sign reports, and it preoccupies them to no end.
If I was in the process of creating a genetic test, I would manually curate each and every variant on the panel and classify them based on those that have proven association to disease from high-powered studies, and then others whose association is uncertain / weak / requires further validation. This is what my colleagues in the clinical genetics community do each day, i.e., rigorously curate each and every variant based on published literature.
At the end of the day: this is your product in the area of health sciences, and you should therefore consider the ethical implications of what you report.
Trust that this helps.
Kevin