If an individual has a heterozygous indel how should that be represented in a sequence?
I am looking for some examples along the lines of
A[C]GT
or A[C/-]GT
If an individual has a heterozygous indel how should that be represented in a sequence?
I am looking for some examples along the lines of
A[C]GT
or A[C/-]GT
You can look through the nomenclature standards set up by the Human Genome variation Society. This is a nomenclature for the description of sequence variants - DNA, RNA, protein.
A whole series of examples are given for deletions as in your example. I admit these can be confusing to follow in some regard, but part of the reason for this is the number of scenarios presented.
From the deletion section:
A nucleotide deletion is a sequence change where one or more nucleotides are removed (see see Standards - Definition). Deletions are described using "del" after an indication of the first and last nucleotide(s) deleted, separated by a "_" (underscore).
* c.76_78del (alternatively c.76_78delACT) denotes a ACT deletion from nucleotides 76 to 78
* deletions with uncharacterised breakpoints (see Uncertainties)
o c.88-?_923+?del denotes an exonic deletion starting at an unknown position in the intron 5' of coding DNA nucleotide 88 and ending at an unknown position in the intron 3' of coding DNA nucleotide 923
o c.(?_-30)_(*220_?)del denotes the deletion of the entire gene (coding DNA reference sequence running from -30 (cap site) to *220 (polyA-addition site)
o c.88+101_oGJB2:c.355-1045del denotes a deletion which ends in the flanking GJB2 gene at position 355-1045 (in the intron between nucleotides 354 and 355) on the reverse strand (the genes are thus located and fused in opposite transcriptional directions, see Discussion)
* for all descriptions the most 3' position possible is arbitrarily assigned to have been changed (see FAQ);
o ACTTTGTGCC to ACTTGCC is described as c.5_7del (alternatively c.5_7delTG, not as c.4_6delTTG)
o ctttagGCATG to cttagGCATG in an intron is described as c.301-3delT (not as c.301-5delT)
o TCACTGTCTGCGGTAATC to TCACTG CGGTAATC is described as c.7_10del (alternatively c.7_10delTCTG) and not as c.4_7delCTGT
o AAAGAAGAGGAG to AAAG GAG is described as c.5_9del (alternatively c.5_9delAAGAG) and not as c.3_7delAGAAG
o Exceptions
+ c.1210-12T(5_9) and not c.1210-6T(5_9)describes the variable stretch of 5 to 9 T-residues in intron 9 of the CFTR gene. The most commonly used CFTR coding DNA reference sequence contains a stretch of 7 T's (see Variability of short sequence repeats).
NOTE: to discriminate known variable sequences from other changes it is recommended to describe individual alleles differing from the reference sequence like c.1210-12T[5] (preferred over c.1210-7_1210-6delTT) or c.1210-12T[9] (preferred over c.1210-7_1210-6dupTT).
+ using a coding DNA reference wequence there is an exception to the rule when identical nucleotides flank an intron (e.g. exon 3 ends with ..CAAgt, exon 4 starts with agACG.., C being nucleotide c.123). When the genomic sequence shows that the last A-nucleotide of exon 3 is deleted (and not the first A-nucleotide in exon 4), the deletion changing ..CAAACG.. at coding DNA level to ..CAACG.. is described as c.125delA and not c.126delA.
Your example is consistent with what we would do. When handling the sequence in the lab, we use a notation system that references the genomic coordinates, so a heterozygous deletion would be something like this:
chr1:144546421 AC [CTCCC/-] GG
And a heterozygous insertion would be like this:
chr1:144546421 A [C/CTCCC] GG
We would also at some point use the notation referencing (if possible) the coding sequence of the gene, ie c.323_329 del (CTCCC) - if you are dealing with coding sequence.
It would be understood that this is a heterozygous deletion.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I find it interesting most people do not consider using these symbols in sequences themselves, or persuading aligners to accept them or callers to produce them, but only as annotation