How To Uniquely Annotate A Deletion In A Homopolymer Run?
5
7
Entering edit mode
14.2 years ago
Casbon ★ 3.3k

Lets say I have discovered a variant where the sequence ATTTA is replaced with ATTA. I could annotate this in three ways (since it is possible that any of the three Ts could have been deleted). Is there a convention on how to annotate these kind of deletions?

snp • 4.2k views
ADD COMMENT
0
Entering edit mode

how would you know which specific T has been deleted ?

ADD REPLY
0
Entering edit mode

you don't, but hopefully there is a way to annotate this.

ADD REPLY
3
Entering edit mode
14.2 years ago
Heikki ▴ 360

The convention is to keep the alignment as similar as possible when you start reading from left.

When dealing with two dissimilar sequences, there is no way of knowing what is the exact sequence of changes between them. When in doubt, use the parsimony principle.

ADD COMMENT
0
Entering edit mode

So you delete the rightmost T?

ADD REPLY
3
Entering edit mode
14.2 years ago

If it can't be uniquely annotated because there is insufficient information, I would use an annotation that indicates the uncertainly, if possible.

A GFF3 alignment feature indicating a match between a reference and this variant, tagged with SO:0000347 (nucleotide_match). You could leave it to the aligner to decide where to put the gap and the downstream analyst to interpret the alignment.

Or, if you want to be cutting edge, use the new Genome Variation Format extension of GFF3 which allows further information such as read frequency support of variants to be included. See the preprint of the paper.

ADD COMMENT
2
Entering edit mode
14.2 years ago
Neilfws 49k

Some bioinformatics libraries provide the means to describe "fuzzy" locations. For example, Bioperl has the Bio::LocationI object. The guide to feature annotation explains how it is used - scroll down to "Location Objects". It might give you some ideas about how to describe your example.

ADD COMMENT
0
Entering edit mode

I'd stay clear of fuzzy locations. Nothing good has come out of them. They should be heavily deprecated.

ADD REPLY
0
Entering edit mode

and Heikki is a Bioperl developer, so he would know!

ADD REPLY
2
Entering edit mode
14.2 years ago

Many of the above replies are fine. I would suggest that you consult the Human Genome Variation Society as they have put out an extensive list of nomenclature rules. These naming conventions are now included at the top of most if not all entries in dbSNP.

Here you will find examples for naming DNA variants (as opposed to those affecting protein sequence, e.g.), including those in single nucleotide stretches.

ADD COMMENT
0
Entering edit mode
14.2 years ago
Heikki ▴ 360

Depending on what you are going to do with the annotation, here is an alternative:

If you annotate is as a deletion of one T, you loose the context that it is part of a homopolymer run. The mechanism and evolutionary implications are different from a run of the mill single nucleotide deletion. To highlight the difference you could annotate it as a complex change: (location of the first T)Tx4->Tx3, using what ever notation you decide to follow.

ADD COMMENT
0
Entering edit mode

Heikki: you may merge this section of your answer with your first answer for better attention from the users.

ADD REPLY

Login before adding your answer.

Traffic: 2375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6