Determine Variant Location From Vcf
3
3
Entering edit mode
12.3 years ago
win ▴ 990

hi all, i have a rather large VCF files with about 2.8 million variations. i want to know if each of the variation is within an exon, intron or promoter region?

i have downloaded the corresponding files from RefSeq and I wanted to know if i could use BED Intersect to accomplish this?

can anyone help with the syntax especially i need the output file in a form that i can parse easily and which will tell me if the variant was exonic or intronic or in the promoter region.

thanks in advance

vcf • 5.6k views
ADD COMMENT
0
Entering edit mode

Unfortunately I won't be able to flesh out a more complete answer, but have you taken a look at using a tool like 'annovar' to annotate your VCF file?

http://www.openbioinformatics.org/annovar/

ADD REPLY
1
Entering edit mode
12.3 years ago
Laura ★ 1.8k

You can use the ensembl variant effect predictor

http://www.ensembl.org/Homo_sapiens/UserData/UploadVariations

http://www.ensembl.org/tools.html

To work this out, it will take vcf both for the web interface or the perl script (for a large number of variants the perl script is required) and compare to any given species ensembl database

ADD COMMENT
1
Entering edit mode
12.3 years ago
Mchimich ▴ 320

you can use snpEff wich a felexible software you can use any genome you want since you have a gff file, vcf file and a reference genome. I hope that it will help you. http://snpeff.sourceforge.net/

ADD COMMENT
0
Entering edit mode
12.3 years ago

To annotate this kind of stuff in your VCF upload it to SeattleSeq, or you can download and install a tool like Annovar to accomplish the same thing.These tools will annotate the coding and noncoding regions, and then from there you can go after details like promotor regions, etc.

You can also read more about these here.

ADD COMMENT
1
Entering edit mode

Just for completeness, I'll point out snpEff (http://snpeff.sourceforge.net/, and referenced in the Biostars link above) which achieves similar goals. I found snpEff to be a bit easier to get working for non-human data, for what it's worth.

ADD REPLY
0
Entering edit mode

Yeah, kind of assumed the OP was referring to human data of course. Maybe the OP can clarify what organism the data is from?

ADD REPLY

Login before adding your answer.

Traffic: 1925 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6