When trying to annotate a SV vcf file, big DUP, DEL or INV are not annotated or just the first matching gene in the region is annotated. I have run the command shown below:
perl ${path}/annovar/table_annovar.pl file.vcf ${path}/annovar/humandb/ -buildver hg19 --regionanno -out final_annotation -remove -protocol refGene,clinvar_20170905,exac03,ALL.sites.2015_0_mod8,esp6500siv2_all,avsnp150 -operation g,f,f,f,f,f -nastring . -vcfinput
The VCF does not contain information in the ALT column (region alteration) when the region have a big size and then this column is filled up with DEL,DUP or INV. Example of the VCF file:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
chr1 1044050 - TCACCACAGCCACCATGTC TC 65 PASS END=; GT:GQ:PR:SR 0/1:65:7,0:13,4
chr1 1431164 - G DEL 56 PASS END=1469606 GT:GQ:PR:SR 0/1:56:73,4:62,17
Thinking that this may be the problem, I have run the command below, using as input a file which contains the start and end position of the region so annovar could know the length of the region. The problem appear again big DUP, DEL or INV are not annotated.
perl ${path}/annovar/table_annovar.pl file.avinput ${path}/annovar/humandb/ -buildver hg19 -out test -remove -protocol refGene,clinvar_20170905,exac03,ALL.sites.2015_0_mod8,esp6500siv2_all,avsnp150 -operation g,f,f,f,f,f -nastring .
Example of the avinput file:
1 537588 537647 TTCTCTCCATCCCCCCTCCATCCCCCTCTCCTTTCTCCTCTCCATCCCCCTCTCCATCCC T
1 1431164 1469606 G <DEL>
I know that one solution may be to fill up the ALT column of the VCF file (as we know the length of the region) but this column are empty because of a reason, as these regions are very big (some have size of 217,860,463), this will create a big file that will be computationally expensive.
I would like to know if there is exits a way to handle annotation of big SV.
Thank you.
I suggest using AnnotSV for annotation (with OMIM, DGV, 1000g, haploinsufficiency, TAD, ... and also with your own in-house information)
You can look at this post describing the annotSV tool: Annotation for SV and CNV