Forum:Semantic verification of variants
2
2
Entering edit mode
9.6 years ago
Ram 44k

Hi,

I am looking to verify if a list of variants make semantic sense. This would include checks such as if the co-ordinate and nucleotide parts make sense, and if insertions are actual insertions and not duplications. Have you come across any tool that performs this kind of verification? I deal with lots of variants and this kind of tool would be useful to me.

Examples:

  • c.123delA is valid only if pos #123 is occupied by an 'A'.
  • c.123_124insA is not the right notation, c.123dupA is (assuming #123 is an 'A').
  • Assuming the sequence in c.10_15 is ATATAT, deletion of an 'AT' should be c.14_15delAT (the most 3' of possible positions).

Note that all the above notations are syntactically perfect, but some of them might be semantically wrong. That is what I wish to check - the semantic/contextual validity.

variants hgvs • 2.3k views
ADD COMMENT
1
Entering edit mode
9.6 years ago
stensonpd ▴ 70

Have you tried Mutalyzer? It may contain at least some of the functionality you are looking for.

ADD COMMENT
0
Entering edit mode

Mutalyzer has worked pretty well for me in the past. There's no binary to download and run on your local machine but it can accept bulk input and convert the HGVS to genomic coordinates while verifying the notations.

ADD REPLY
0
Entering edit mode

Thank you for the input. I did indeed look at Mutalyzer, but syntactic verification is the easiest of the steps. This code will do it:

>>>import hgvs.parser
>>>p = hgvs.parser.Parser()
>>>p.parse_hgvs_variant('NM_000123.1:c.1234A>T')

I'm looking to work with something a lot more context sensitive and intelligent.

ADD REPLY
0
Entering edit mode
9.6 years ago
Ram 44k

I am planning to design a tool for this purpose. This question is so I do not end up adding a minor variant to a bunch of existing tools. I worked along these lines and it took me a month to address the variants collected by my lab over a period of 20 years.

If any of you have encountered a tool that addresses this problem, I can invest my time in better pursuits. Or if you can give me an idea of who might find this tool useful, I can work on requirements discussion before working on it. Either way, I'd really appreciate feedback from this community.

ADD COMMENT
0
Entering edit mode

Unfortunately I don't know of a tool that does precisely this sort of checking. We (at SolveBio) have come across this problem too and we're actively developing a broad hgvs "synonymizer" that checks syntax, accuracy (with the underlying sequence), and semantic validity. We're still a bit in the early stages, but we've made a VCF --> RefSeq NM_based HGVS "translator" that we think is pretty accurate (it does the 3' shifting / right-shuffling and calls dups as according to the HGVS specs). We're going to make it available through our API soon, and also work on the other parts (HGVS back to VCF, HGVS "synonymizing" across various reference sequences).

Are you in NYC btw? We're in NYC also, and would love to hear some of your experiences and problems you've been running into. dandan@solvebio.com or @dandanxu if you're interested in coming by sometime and seeing what we've got so far.

ADD REPLY
1
Entering edit mode

HGVS synonymizer is the exact term I was looking for, BTW!

ADD REPLY
0
Entering edit mode

Haha! That's good, because we just kind of made it up. We don't have a good term for "semantic validity" either - we're just calling it "the variant that's closest to the HGVS spec".

ADD REPLY
0
Entering edit mode

How is semantic validity different from the "accuracy" you describe?

ADD REPLY
0
Entering edit mode

Hi,

Thank you - this is exactly what I am looking for, though mine does not involve the VCF part of your pipeline. I have been working on one gene with its mutations obtained from 4 sources, where 3 of the sources had accuracy(as described in your comment) errors, as well as a multitude of other errors.

Yes, I work in NYC, and I'd love to discuss this topic. I will send you an email soon.

ADD REPLY

Login before adding your answer.

Traffic: 2291 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6