SNPs genomic classification based on VCF annotation
2
0
Entering edit mode
6.3 years ago
aarvaBioinfo ▴ 10

Dear All, I want to make a list of SNPs associated to with genes in Genomic region as below :

  1. downstream_gene_variant
  2. upstream_gene_variant
  3. synonymous_variant
  4. Intron_variant
  5. missense_variant
  6. 5'-prime_UTR_variant
  7. 3'-prime_UTR_variant

using my "Annotated vcf file" by snpEff program. and plot them using R program. so anyone has idea how can we do this?

SNP R • 1.8k views
ADD COMMENT
2
Entering edit mode
6.3 years ago

If you already have vep annotated vcf, you can filter by SO terms. You can go for lollipop plots per gene and histogram plot for each SO category

ADD COMMENT
0
Entering edit mode

hi cpad0112,

Thank you very much for your help. my VCF file was annotated with snpEff and I found that some SNPs shows both "upstream" and "downstream_gene_region" in the same data line so do you have some idea how can I filter this and get exact count and percentage of SNPs with the Gene ID as "downstream" , "upstream"......?

ADD REPLY
0
Entering edit mode

Sure. If you could post vep annotation data format, example data and output required, biostars would be able to help you. In the mean time, you can refer to ensembl documentation on filtering vep output here: https://asia.ensembl.org/info/docs/tools/vep/script/vep_filter.html. If output is vcf, you can also use bcftools to filter variants.

ADD REPLY
0
Entering edit mode

hi, cpad0112,

Thank you for your reply and valuable link. I will try this. beside as you asked for example vcf set so here is example of annotated vcf data as: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Chr01 85514 . G C 222.003 PASS VDB=0.0681475;SGB=-0.693021;MQSB=1;MQ0F=0;AF1=1;AC1=2;MQ=37;FQ=-107.987;ANN=C|upstream_gene_variant|MODIFIER|Glyma.01G000500|Glyma.01G000500.Wm82.a2.v1|transcript|Glyma.01G000500.1.Wm82.a2.v1|protein_coding||c.-4775G>C|||||4775|,C|downstream_gene_variant|MODIFIER|Glyma.01G000400|Glyma.01G000400.Wm82.a2.v1|transcript|Glyma.01G000400.1.Wm82.a2.v1|protein_coding||c.5040C>G|||||4638|,C|intergenic_region|MODIFIER|Glyma.01G000300-Glyma.01G000400|Glyma.01G000300.Wm82.a2.v1-Glyma.01G000400.Wm82.a2.v1|intergenic_region|Glyma.01G000300.Wm82.a2.v1-Glyma.01G000400.Wm82.a2.v1|||n.85514G>C||||||DP=476;DP4=0,0,223,220 GT:PL Chr01 85582 . G A 222.003 PASS VDB=0.0681475;SGB=-0.693021;MQSB=1;MQ0F=0;AF1=1;AC1=2;MQ=37;FQ=-107.987;ANN=A|upstream_gene_variant|MODIFIER|Glyma.01G000500|Glyma.01G000500.Wm82.a2.v1|transcript|Glyma.01G000500.1.Wm82.a2.v1|protein_coding||c.-4707G>A|||||4707|,A|downstream_gene_variant|MODIFIER|Glyma.01G000400|Glyma.01G000400.Wm82.a2.v1|transcript|Glyma.01G000400.1.Wm82.a2.v1|protein_coding||c.4972C>T|||||4570|,A|intergenic_region|MODIFIER|Glyma.01G000300-Glyma.01G000400|Glyma.01G000300.Wm82.a2.v1-Glyma.01G000400.Wm82.a2.v1|intergenic_region|Glyma.01G000300.Wm82.a2.v1-Glyma.01G000400.Wm82.a2.v1|||n.85582G>A||||||DP=273;DP4=0,0,126,132 GT:PL Chr02 36650710 . G A,C 222.003 PASS VDB=0.0681475;SGB=-0.693021;MQSB=1;MQ0F=0;AF1=1;AC1=2;MQ=37;FQ=-107.987;ANN=A|5_prime_UTR_variant|MODIFIER|Glyma.16G205900|Glyma.16G205900.Wm82.a2.v1|transcript|Glyma.16G205900.1.Wm82.a2.v1|protein_coding|2/3|c.-211G>A|||||1806|,A|upstream_gene_variant|MODIFIER|Glyma.16G205800|Glyma.16G205800.Wm82.a2.v1|transcript|Glyma.16G205800.1.Wm82.a2.v1|protein_coding||c.-4909C>T|||||4909|;RPB=0.931841;MQB=1;BQB=0.295808;PV4=0.026087,0.00614403,1,1;DP=343;DP4=6,13,147,142
Chr02 36653298 . A T,G 222.008 PASS VDB=0.462961;SGB=-0.693021;MQSB=0.981338;MQ0F=0;AF1=1;AC1=2;MQ=32;FQ=-107.987;ANN=T|synonymous_variant|LOW|Glyma.16G205900|Glyma.16G205900.Wm82.a2.v1|transcript|Glyma.16G205900.1.Wm82.a2.v1|protein_coding|3/3|c.783A>T|p.Ala261Ala|1235/3686|783/2340|261/779||,T|upstream_gene_variant|MODIFIER|Glyma.16G206000|Glyma.16G206000.Wm82.a2.v1|transcript|Glyma.16G206000.1.Wm82.a2.v1|protein_coding||c.-3094A>T|||||2950|;DP=254;DP4=0,0,116,119 GT:PL
Chr03 36653505 . A G 222 PASS VDB=0.729579;SGB=-0.686358;MQSB=0.885987;MQ0F=0;AF1=1;AC1=2;MQ=36;FQ=-68.9862;ANN=G|synonymous_variant|LOW|Glyma.16G205900|Glyma.16G205900.Wm82.a2.v1|transcript|Glyma.16G205900.1.Wm82.a2.v1|protein_coding|3/3|c.990A>G|p.Gln330Gln|1442/3686|990/2340|330/779||,G|upstream_gene_variant|MODIFIER|Glyma.16G206000|Glyma.16G206000.Wm82.a2.v1|transcript|Glyma.16G206000.1.Wm82.a2.v1|protein_coding||c.-2887A>G|||||2743|;DP=277;DP4=0,0,127,138 GT:PL
Chr03 36653531 . A G 222.001 PASS VDB=0.341291;SGB=-0.683931;MQSB=0.899815;MQ0F=0;AF1=1;AC1=2;MQ=36;FQ=-65.9862;ANN=G|missense_variant|MODERATE|Glyma.16G205900|Glyma.16G205900.Wm82.a2.v1|transcript|Glyma.16G205900.1.Wm82.a2.v1|protein_coding|3/3|c.1016A>G|p.Asn339Ser|1468/3686|1016/2340|339/779||,G|upstream_gene_variant|MODIFIER|Glyma.16G206000|Glyma.16G206000.Wm82.a2.v1|transcript|Glyma.16G206000.1.Wm82.a2.v1|protein_coding||c.-2861A>G|||||2717|;DP=273;DP4=0,0,126,132 GT:PL

and format for required output : table

CHROM POS SNP-ID Ref Alt Gene_name Gene_ID SNP_types or SNP-classification SNP-number Percent(%)
Chr02 36653298 Chr02_36653298_A_T A T,G Glyma.16G205900 Glyma.16G205900.Wm82.a2.v1 synonymous_variant 10 0.5%

Is there a solution I can get this output using my annotated vcf file ??

ADD REPLY
0
Entering edit mode

Where are you getting 10 and 0.5% from, in expected output? aarvaBioinfo. Rest things are easy. Btw, it would help if you could post vcf headers some where and share with forum, in addition to vcf records (if it is not an issue).

ADD REPLY
0
Entering edit mode

hi, cpad0112, I highly appreciate you for help. I gave just for example 10 and 0.5%. not calculated value. sorry due to confidential, I could not share vcf data. if i just keep table header as: CHROM POS SNP-ID Ref Alt Gene_name Gene_ID SNP_types or SNP-classification then it can be made using my given example data?

ADD REPLY

Login before adding your answer.

Traffic: 1682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6