faster variant annotation for large VCFs
1
0
Entering edit mode
6.3 years ago
Amitm ★ 2.3k

hi there,

I am trying to annotate WGS VCF files through VEP, and even on multi-threading (n=8) the process is painfully slow. The VCF sizes range between 1-1.5Gb. Around 2 days for each VCF

What would you recommend to speed up the process? I can break the VCF into chunks. Any other sol. or software reco.? Apart from gene & functional impact level anno., I am using VEP plugins for CADD and gnomAD genome based freq.

Thanks

vcf variant annotation WGS • 4.0k views
ADD COMMENT
1
Entering edit mode

There is a parameter --buffer-size that controls how many variants are loaded into memory. Double-check with the manual, as this might substantially increase speed if you have the RAM available.

ADD REPLY
0
Entering edit mode

many thanks! Going to try this

ADD REPLY
0
Entering edit mode

Thanks again. With 2x the buffer size than default and n=16 threads the process completed in ~7hrs!

ADD REPLY
0
Entering edit mode

Hi,

I see that you have used CADD plugin fro VEP. Could you please let me know how you did it. It does not work for me and gives me a blank column of CADD scores.

Thank You

ADD REPLY
1
Entering edit mode

'does not work' is not a helpful error message. How did you install and run it?

ADD REPLY
0
Entering edit mode

I am trying to run by using "--plugin CADD, path/to/ InDels_inclAnno.tsv.gz,path/to/ whole_genome_SNVs_inclAnno.tsv.gz".But the column with CADD is blank in the output.

Thanks

ADD REPLY
0
Entering edit mode

Hi, Have you been able to make it work? After installing VEP, you need to have run -

perl \
INSTALL.pl \
--AUTO p \
--PLUGINS CADD,ExAC \
--SPECIES homo_sapiens_merged \
--ASSEMBLY GRCh37

That would configure the plugins (you specify) under the vep data cache Plugins/ path. You would get msg like this -

 - installing "CADD"
 - This plugin requires data
 - See /Users/akmandal/.vep/Plugins/CADD.pm for details
 - OK

Once done, I created a dir. -> .vep/Plugins/dat_CADD_1.3 and downloaded whole_genome_SNVs.tsv.gz & InDels.tsv.gz (along with their index files). Then providing this arg. to VEP run -

--plugin CADD,/path/to/dat_CADD_1.3/whole_genome_SNVs.tsv.gz,/path/to/dat_CADD_1.3/InDels.tsv.gz \

does the job.

ADD REPLY
1
Entering edit mode
6.3 years ago
Emily 24k

There's some information about VEP speed in this blog post. Your plugins are probably slowing you down a bit as they have to communicate with databases rather than using your offline cache, which can be slower – I'm afraid there's not a lot you can do about that.

ADD COMMENT

Login before adding your answer.

Traffic: 1621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6