Summary from a vcf file
1
0
Entering edit mode
5.6 years ago
felipead66 ▴ 120

I have a single vcf file with snps from 10 cultivars. After running snpeff, how could i get summary information such as how many exons, introns, intergenic regions etc do i have for each cultivar?

snps vcf • 1.7k views
ADD COMMENT
0
Entering edit mode

As far as I know, you need to first split the VCF files, so there is a single VCF file per sample (i.e., cultivar in this case) and eventually merge them after running snpEff. A snpEff-annotated VCF file can give information on things such as SNPEFF_EFFECT=INTRON and SNPEFF_EFFECT=INTERGENIC, exons are little more complicated if I remember correctly. You would probably use grep -c SNPEFF_EFFECT=INTRON, grep -c SNPEFF_EFFECT=INTERGENIC and not sure about exons. I would recommend reading more about snpEff on biostars.org as well as from the developer's website.

ADD REPLY
1
Entering edit mode
5.6 years ago

Extract the genotype and annotation files using SnpSift and then summerise with R or Python to get the total count of SNP types per sample.

Extracting annotations fields using SnpSift:

java -jar SnpSift.jar extractFields annotated.vcf CHROM POS REF ALT ANN[*].GENE EFF[*].EFFECT "GEN[*].GT" > results.tsv

Ref:http://snpeff.sourceforge.net/SnpSift.html#Extract

Annotation summary example:

library(dplyr)
summary<-results[,c(Effect Column, Columns containing genotype data)] %>% group_by(Effect Column) %>% summerise_all(funs(sum))
ADD COMMENT

Login before adding your answer.

Traffic: 2288 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6