Entering edit mode
7.4 years ago
misbahabas
▴
70
Asslam o Alikum
I used snpeff for variants annotation of single in multiple species and I cannot understand how Interpret its results
##fileformat=VCFv4.1
##INFO=<ID=AB,Number=1,Type=String,Description="Alt Base">
##SnpEffVersion="4.3p (build 2017-06-06 09:55), by Pablo Cingolani"
##SnpEffCmd="SnpEff GRCh37.75 AMY2B.vcf "
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT MP-Hsap_AM MP-Lafr_AM MP-Cang_AM MP-Ptro_AM MA-Phod_AM MP-Pcoq_AM MC-Clup_AM MC-Fcat_AM MA-Chir_AM MR-Mmus_AM MR-Jjac_AM MR-Hgla_AM MC-Mnat_AM MH-Nleu_AM MC-Oros_AM MA-Sscr_AM MP-Ppan_AM MP-Mmul_AM MA-Oari_AM MA-Etel_AM MC-Ptig_AM AA-Xtro_AM MA-Bbub_AM MC-Lwed_AM MP-Ggor_AM MA-Bmut_AM MA-Bind_AM
1 2 . - T,A . . AB;ANN=T|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.2->T||||||,A|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.2->A|||||| . . . T . . . . A . . . A . . . . . . . . . . . . . . .
1 4 . - T,G . . AB;ANN=T|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.4->T||||||,G|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.4->G|||||| . . . T . . . . . . . . G . . . . . . . . . . . . . . .
1 5 . - T,A . . AB;ANN=T|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.5->T||||||,A|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.5->A|||||| . . . T . . . . . . . . A . . . . . . . . . . . . . . .
1 6 . - G,T . . AB;ANN=G|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.6->G||||||,T|intergenic_region|MODIFIER|CHR_START-DDX11L1|CHR_START-ENSG00000223972|intergenic_region|CHR_START-ENSG00000223972|||n.6->T|||||| . . . G . . . . . . . . T . . . . . . . . . . . . . . .
Please help me to do this, I used snpeff first time and not understand how interpret it
Thanks
I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:
Have you tried the manual?
Especially in the part Input and Output files it clearly explains the output you get.
thankx for reply ,
I read the manual but i cannot understand warnings in vcf files like
WARNING_REF_DOES_NOT_MATCH_GENOME
I used human AMY2B genes to align different species AMY2B gene and than annotate using human as a ref database but warnings in vcf file
in the manual "This happens when your data was aligned to a different reference genome than the one used to create SnpEff's database. If there are many of these warnings, it's a strong indicator that the data doesn't match and all the annotations will be garbage (because you are using the wrong database)."
But i used human gene as a ref in multiple sequence alignment and than human database used for annotation please give me any idea about it
So, now your question is becoming way more specific and you also shared more information on how you obtained the data. That is very important because 12 hours ago this question was not about
WARNING_REF_DOES_NOT_MATCH_GENOME
.The warning even wasn't shown in your example! You need to understand that you have to give us information in order to answer your question accurately.
So you aligned using one human gene in multiple sequence alignment and then used the genome wide snpeff database. So that's indeed a mismatch between your vcf and the human database. I assume this question follows on this one: How find variants between 26 genes sequence Do I guess correctly that you have fasta sequences of multiple species and want to check the differences?
As you can see, the vcf states that these variants are on "chromosome 1, position 2,4,5,6", which is obviously not the case. That's not the location of the gene. I'm not sure how to fix this. One approach I can think of is to align your sequences to the human reference genome using LAST, attempt variant calling on that and use the resulting vcf for annotation using SnpEff.
But please be more specific when asking questions.
Thankx Its helpful
yeah I have fasta sequences of multiple species and want to check the differences, I used mafft for alignment and snp-sites for variants(vcf), this vcf used for annotation by snpeff but after annotation it give error like above
LAST is a local aligner I want multiple sequence alignment