Convert INDEL format for use in Annovar
1
0
Entering edit mode
8.2 years ago
User 7754 ▴ 270

Hi,

I have a tab delimited file with GWAS results and I am trying to annotate the variants using Annovar but my format for INDELS is different, and I get my indels split between the invalid_input, where I find these SNPs in my file:

1       63735   63735   CCTA    C
1       251627  251627  AC      A      
1       760811  760811  CTCTT   C

While they should be like this:

1   63735   CCTA    4C  0.339   rs201888535
1   251627  AC  2A  0.172   rs72502741
1   760811  CTCTT   5C  0.0417  rs200712425

and some in the "filtered" file like this:

1   36549207    36549207    A   ACT

which are in Annovar like this:

1   36549207    A   0CTC    0.9076  rs143406521
1   36549207    A   1ACTC   0.9076  rs143406521

I am wondering what is the best way to convert these formats, and if there is a standard way/a script to do this as I am afraid to get it wrong and convert only insertions and not deletions or the opposite... thanks so much for your help.

annovar indel annotation insertion-deletion • 3.5k views
ADD COMMENT
1
Entering edit mode
8.2 years ago
igor 13k

ANNOVAR website provides instructions on preparing VCFs:

So as a user, this is what you should do: (1) split VCF lines so that each line contains one and only one variant (2) left-normalize all VCF lines (3) annotate by ANNOVAR.

For example, suppose the input is ex1.vcf.gz (make sure that it is processed by bgzip and then by tabix), this is what you would do:

bcftools norm -m-both -o ex1.step1.vcf ex1.vcf.gz

bcftools norm -f human_g1k_v37.fasta -o ex1.step2.vcf ex1.step1.vcf

The first command split multi-allelic variants calls into separate lines, yet the second command perform the actual left-normalization. The FASTA file is needed in the second command.

Source: http://annovar.openbioinformatics.org/en/latest/articles/VCF/

ADD COMMENT
0
Entering edit mode

Hi, thank you Igor for your reply, but I don't have a VCF file only a simple text file so I can't really use the utilities for VCF right? Maybe I can anyway? Thanks again

ADD REPLY
1
Entering edit mode

I didn't realize you don't have a VCF at all. Not sure how you can do this with a custom format. My first suggestion would be to create a VCF file, but I am not sure how easy it would be, especially with indels. Some options here: bed to vcf format conversion

ADD REPLY
0
Entering edit mode

Great! this would work, thank you!!

ADD REPLY
0
Entering edit mode

Sorry @igor I tried your tip for using ANNOVAR about INDELS but I am getting this error

[fi1d18@cyan01 annovar]$ bcftools norm -f hs37d5.fa -o ex1.step2.vcf ex1.step1.vcf                              [fai_fetch_seq] The sequence "chr1" not found
faidx_fetch_seq failed at chr1:1499775
[fi1d18@cyan01 annovar]$

But I don't know what does that mean

ADD REPLY

Login before adding your answer.

Traffic: 2742 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6