Hi,
I am looking to verify if a list of variants make semantic sense. This would include checks such as if the co-ordinate and nucleotide parts make sense, and if insertions are actual insertions and not duplications. Have you come across any tool that performs this kind of verification? I deal with lots of variants and this kind of tool would be useful to me.
Examples:
c.123delA
is valid only if pos #123 is occupied by an 'A'.c.123_124insA
is not the right notation,c.123dupA
is (assuming #123 is an 'A').- Assuming the sequence in
c.10_15
isATATAT
, deletion of an 'AT' should bec.14_15delAT
(the most 3' of possible positions).
Note that all the above notations are syntactically perfect, but some of them might be semantically wrong. That is what I wish to check - the semantic/contextual validity.
Mutalyzer has worked pretty well for me in the past. There's no binary to download and run on your local machine but it can accept bulk input and convert the HGVS to genomic coordinates while verifying the notations.
Thank you for the input. I did indeed look at Mutalyzer, but syntactic verification is the easiest of the steps. This code will do it:
I'm looking to work with something a lot more context sensitive and intelligent.