Don't know where else to post this, so I figured I'd give this a shot. As in the title, I'm annotating with SnpEff (an old version, 3.3, using GRCh37.73) and encountering an oddity which I'm confused about: I have the deletion 12:133049417GGCAGCTCGTGAACCCGGTGTCCCCTCCTGGAGAGGACGTGGGCAGCTCGTGGACCCGGGTCCCTTCCTGGAGAGGA>G and I get (amongst other annotations), that it is a CODON_DELETION (i.e. in-frame deletion) in ENST00000595994 with an amino acid change of SSPGRDPHELPTSSPGGDTGFTSC258- and a CODON_DELETION in ENST00000595994 with an amino acid change of SSPGRDPHELPTSSPGGDTGFTSC265- (and they differ in exon rank, 7 and 8, respectively).
To be explicit, here are the two entries:
CODON_DELETION(MODERATE||tcctctccaggaagggacccgcacgagctgcccacgtcctctccaggaggggacaccgggttcacgagctgc/-|SSPGRDPHELPTSSPGGDTGFTSC258-|308|MUC8|protein_coding|CODING|ENST00000595994|7|1|WARNING_TRANSCRIPT_NO_START_CODON)
CODON_DELETION(MODERATE||tcctctccaggaagggacccgcacgagctgcccacgtcctctccaggaggggacaccgggttcacgagctgc/-|SSPGRDPHELPTSSPGGDTGFTSC265-|308|MUC8|protein_coding|CODING|ENST00000595994|8|1|WARNING_TRANSCRIPT_NO_START_CODON)
For reference, the format here is: Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_change| Amino_Acid_length | Gene_Name | Transcript_BioType | Gene_Coding | Transcript_ID | Exon | GenotypeNum [ | ERRORS | WARNINGS ] )
What's really strange to me is that I've annotated literally thousands of VCFs and this just seemingly started happening with a new data set my lab got. Anyone have an idea as to why this would occur?
The deleted sequence is a repeat element that present multiple times in perfect/near-perfect copies in the MUC8 gene. It's not surprising that it's mapping to multiple positions.
You say that this behavior is new, and that you're using the same (old) production pipeline, which suggests that something has changed upstream. What type of data are you feeding into the pipeline - FASTQs, or data that's been processed in some manner (e.g., aligned BAMs)?
Looks like a bug; I suggest you report it to the developers. I'm guessing this is another race condition. Try running it, restricting it to a single thread, and see what happens.
Hi Brian, thank you for your response. I have forwarded this to Pablo (the SnpEff developer), but it is being run with a single thread already.
When you see something odd, first try to update your software. Perhaps it's a bug which is already fixed. I would expect the developer(s) will also ask you to try and reproduce this result with the latest version.
Understood, figured it was a long-shot but in my case I can't update the version, as it's part of a long-standing production pipeline.