Question

VEP output stat_file.html and .vcf has different number of SNPs with rsID. Why?

0

Entering edit mode

5.1 years ago

juafonso_bio ▴ 40

Hi

I used VEP (command line) to annotate 147,204 SNPs identified in Bos taurus, with the --check_existing option and representing the SNPs in the input by chromosome:position. Code below.

vep --variant_class --format vcf --sift b --vcf_info_field ANN --offline --cache \ --dir_cache /home/program2/bin/VEP/95.2/cache/ --species "bos_taurus" \ --check_existing --stats_file vepstats_allrecodes.html --gene_phenotype \ -i /home/recode.vcf -o snps_annotated_allrecode.vcf

In the table in the beginning of the stats_file.html it says that there were 147,204 variants processed, 0 variants filtered out, 16,931 novel variants and 130,273 known variants.

But, when I open the snps_annotated_allrecode.vcf file and remove the duplicated chromosome:position (because I just want to see which SNPs are new and I am not interested now in the different transcripts) I get 147,204, SNPs as expected, 130,130 SNPs with rsIDs (so known) and 17,075 SNPs without rsIDs (so new). There were 144 more SNPs without rsID and 144 less with rsID in the .vcf file than it shoud based on the .html file.

Another researcher is having the same problem with another data set.

Which file is the right one?

Thank you

SNP • 863 views

ADD COMMENT • link 5.1 years ago by juafonso_bio ▴ 40