Indel Left/Right Alignment
4
9
Entering edit mode
11.7 years ago
sa9 ▴ 870

When matching indels between different VCF files (generated by different callers), there is this issue with left / right indel alignment, For example:

enter image description here

Here is a real example for one indel (from the same sample) called by:

Samtools --> 1 161047125 CTATA C

GATK --> 1 161047130 TATAG T

I know GATK has a small tool called "LeftAlignIndels" to solve this issue in the BAM files but I can't use it.

I am wondering if someone knows what is the indel alignment direction in samtools, GATK and Dindel? Is there an easy way to correct this at level of VCF files?

Thanks!

indel • 15k views
ADD COMMENT
9
Entering edit mode
11.7 years ago
lh3 33k

For NGS analysis, the convention is to left align indels. To use GATK and samtools, you should use an aligner that left aligns indels; otherwise at least samtools will have worse performance and accuracy. It is too late to fix the issue in VCF.

EDIT: wait.. In your example, the two callers deleted different bases. The two calls are intrinsically different. You cannot move the indel to make them the same.

ADD COMMENT
3
Entering edit mode
9.1 years ago
Erik Garrison ★ 2.4k

I suggest taking a look at vt. It includes a very nice left alignment routine. They've got a nice paper describing the method for normalizing the representation: http://bioinformatics.oxfordjournals.org/content/31/13/2202

ADD COMMENT
0
Entering edit mode

Thanks Erik. Vt is awesome. We have been using it for couple of months now.

ADD REPLY
2
Entering edit mode
11.7 years ago
pd3 ▴ 350

There is the "vcf norm" tool in htscmd which left-aligns and normalizes indels in VCFs. It can be downloaded from github, google for 'htslib'.

ADD COMMENT
1
Entering edit mode

Thanks Ixe, we used similar tool from GATK specifically for VCF files (http://goo.gl/0f7bT) but as lh3 pointed out, it is too late to correct in the VCF file.

ADD REPLY
0
Entering edit mode

Why too late in VCF?

ADD REPLY
0
Entering edit mode
6.2 years ago

a) Create a reversed reference genome b) Create a tool that reverse variants alleles .. and changes position (subtrack length_of_chrom +1- vcf_pos) c) Use a left-normalizing/shifting tool. d) use step b) again.

Voila.. Right shifting tool. Will post once I have it done.

ADD COMMENT

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6