Entering edit mode
3.6 years ago
boxate1618
▴
60
for example I run snpEff my_vcf.vcf
where my_vcf.vcf
is
#CHROM POS REF ALT
chr1 97828139 G A
will give annotation stop_gained
. How do I know this is referring to the presence of the alternate allele A
. Does `snpEff' really go through and look at ref and alt alleles of does it just update annotation based on position?
I would say that if it says
stop_gained
then I'd say that the most direct interpretation of that statement is that the presence ofA
leads to astop_gain
.I am bit uncertain what you mean here. Why does it matter to matter to you "how" it knows that a stop codon is gained. Are you concerned that in some cases the annotation might be incorrect?
As always visualize your VCF relative to the annotation in IGV. Then, when zoomed in, and if translations tables are enabled in the view, you can view the wild type codons in different frames and you can immediately validate any statement.
Istvan,
I think OP's question is if the annotation is based on CHR-POS-REF-ALT or just CHR-POS set of fields. VCF annotation frequently happens by the annotator matching a subset of the four fields that form the quasi composite primary key with the library file (dbSNP VCF or whatever other ROD file), and while designing VCF annotation pipelines, we need to make sure which fields are being matched to the library annotation. I think this is what OP is concerned with. Some digging into snpEff documentation should reveal how they match.
yes, i am digging now
To follow up, I am guessing it is the first base before the pipe following
ANN
: https://pcingola.github.io/SnpEff/se_inputoutput/, but will check with authorsTo clarify another example
is annotated as
missense_variant
, is the assumption that the alternate allele produces missense? The reference allele produces misense? I realize I could visualize manually, but if scanning dozens or hundreds as often the case it is more important to know the behavior of the software outrightIf it is a missense variant, that means that the two alleles produce different amino acids. Determining which one is the missense mutation would require knowing which one is the ancestral allele and which one is the mutant. Otherwise, either could be considered missense variants of the other depending on which you consider to be the reference.
We provide SNPeff with an annotation, and we provide it with a VCF file. When we do so we implicitly state that the REF column of the VCF matches the annotation that SNPeff operates on.
In that scenario, the variant annotation ought to refer to the effect of the variant. I believe that all information is reported relative to what SNPeff is aware of as being the "reference" and I would be eager to know if that is not the case.
OP here, see my link to the docs above but to me it really looks like the spell out exactly which alt allele the annotation is referring to, so my guess would be that they match alt allele at that position. Let me know if you interpret otherwise