Extract SNPs from VCFfile located in genes based on GFF file information
1
1
Entering edit mode
6.4 years ago
Denis ▴ 310

I have a VCF file with SNPs and genes subset of GFF file (only genes are present). How to extract SNPs in VCF format located in genes from my data?

gene snp • 5.6k views
ADD COMMENT
11
Entering edit mode
6.4 years ago

Use bedtools:

$ bedtools intersect -a input.vcf -b genes.gff -header -wa > output.vcf

EDIT:

For (very) large vcf files it might be more efficient to bgzip and tabix index the vcf file, convert your gff to bed and use tabix to query the regions

1. bgzip and index

$ bgzip -c input.vcf > input.vcf.gz
$ tabix input.vcf.gz

2. gff to bed

E.g with BEDOPS:

$ gff2bed < genes.gff > genes.bed

3. Query the regions

$ tabix -R genes.bed -h input.vcf.gz > output.vcf

fin swimmer

ADD COMMENT

Login before adding your answer.

Traffic: 2236 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6