Remove genes that each of a gene with multiple SNVs
1
0
Entering edit mode
8.3 years ago
zengtony743 ▴ 80

I generated a vcf file with snpEff annotation tool . In this file, there are many genes needed to be removed because a gene with multiple SNVs, these are all false positive SNVs that I need to remove. Is there tools can do this? I usually do manually by transferring vcf to a txt format file and then excel file using annovar package. Is there tools can do this by running a script by not by excel table ?

vcf • 2.0k views
ADD COMMENT
0
Entering edit mode

Please respond quickly and remove other(same) questions of your.

ADD REPLY
0
Entering edit mode

Thanks ! Just don't know what's going on with my cell phone today

ADD REPLY
1
Entering edit mode
8.3 years ago

Say your threshold is one SNP per gene, and you have a file of SNPs called snps.vcf and a BED file containing gene annotations called genes.bed. You could use vcf2bed and bedmap:

$ vcf2bed < snps.vcf > snps.bed
$ bedmap --count --echo --delim '\t' genes.bed snps.bed \
    | awk '$1==1' \
    | cut -f2- \
    > genes_with_one_overlapping_snp.bed

Or as a one-liner:

$ vcf2bed < snps.vcf | bedmap --count --echo --delim '\t' genes.bed - | awk '$1==1' | cut -f2- > genes_with_one_overlapping_snp.bed
ADD COMMENT
0
Entering edit mode

Thanks Alex, this is the first time to use vcf2bed,

When i run

$ perl PATH/vcf2bed.pl —keep-header < my_file.vcf

It shows

Cannot open --keep-header at PATH/vcf2bed.pl line 12

my_file.vcf works fine, did I miss something?

ADD REPLY
0
Entering edit mode

I'm not sure what vcf2bed.pl is. You might take a look here and see if you have this installed? https://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/vcf2bed.html

ADD REPLY

Login before adding your answer.

Traffic: 2658 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6