Entering edit mode
7.4 years ago
reza
▴
300
hi everyone
first question: i have a vcf file resulted from samtools, belonging to a mammalian that assembled in scoffold level. I annotated it and now want to extract non-synonymous SNPs and found genes containing ns SNPs. how i can extract sequences of genes that have ns SNPs to blast them and finding gene names.
second question is: how can i extract indels that located in genic region (in annotated vcf file) and get its length to plot it?
thanks in advance
try VEP and tutorial is here: http://www.ensembl.org/info/docs/tools/vep/script/vep_tutorial.html. Once annotated, you can filter the variants (Nonsynonymous) with full annotation. For filtering nonsynonymous variants, follow the tutorial here: http://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html. Note that VEP can be customized to annotate variants only in coding regions and look at the VEP options for this.
thanks for your answer but my under study animal is not in ensembl. i used snpeff to annotation.
Try snpsift on snpeff output: http://snpeff.sourceforge.net/SnpSift.html
Example code (modified from manual) to filter missense variant :
For indel filtering:
thanks, it is helpful, i try it with "( EFF[*].EFFECT = 'NON_SYNONYMOUS_CODING' )" and it worked.
your suggested way worked for extraction one effect but when i try it for several effect (below command), it did not worked.
java -jar SnpSift.jar filter "( EFF[].EFFECT = 'NON_SYNONYMOUS_CODING' )" & "( EFF[].EFFECT = 'STOP_GAINED' )" & "( EFF[*].EFFECT = 'STOP_LOST )" snpeff_annotated.vcf > fitlered_output.vcf
how can i extract several effects simultaneously?
For your second question, bedtools intersect can extract SNPs / indels intersecting you annotation, if you have it on bed or gff format. You can easily get gene length from the bed / gff as well.