How To Detect Overlaping Indels In Vcf File?
2
0
Entering edit mode
12.1 years ago
mikyatope • 0

Hi all,

I have a vcf file with indels from a whole genome analysis and I want to detect overlapping between indels... I tried to use BEDtools-intersect but it asks for 2 files, and I only have a single file. Before trying the option of giving the same file 2 times to BEDtools-intersect maybe someone knows a better way or a better tool to achieve this.

Thanks!

indel vcf • 4.6k views
ADD COMMENT
0
Entering edit mode

If you want to remove them, then Galaxy has a tool Delete Overlapping Indels

ADD REPLY
0
Entering edit mode

interesting, but my file is 10Gb, does Galaxy support those upload sizes?

ADD REPLY
0
Entering edit mode

BEDOPS tools are designed to handle arbitrarily-sized inputs and may be a useful alternative to uploading a 10 Gb file. Please see my comment to Irsan's answer.

ADD REPLY
0
Entering edit mode

Is the data phased? If so you can use something like vcfgeno2haplo -w 1000 and it will describe when the indels are "impossible" (e.g. overlapping on the same haplotype) on stderr.

Alternatively, you could call with a method that doesn't generate overlapping indels (a haplotype detection method) and ensure that the input is left-aligned and homogenized.

ADD REPLY
2
Entering edit mode
12.1 years ago
Irsan ★ 7.8k

Try bedops. It has a merge option that collapses overlapping elements in 1 or more input files. Make sure you have sorted the vcf file with sort-bed first

ADD COMMENT
0
Entering edit mode

sorry, but it seems that bedop only uses BED as input and I have VCF files

ADD REPLY
1
Entering edit mode

The BEDOPS suite includes a vcf2bed conversion script, if this helps. The bedops tool operates on file streams in linear time and has a low, constant memory footprint, so it will scale to your 10 Gb input file size very nicely (see the Bioinformatics paper and supplementary figures for performance analysis), but you would want to do sorting with the "Big Bed Merge Sort" (bbms) tool, instead of sort-bed, unless you have more than 10 Gb of system memory. (When BEDOPS v2 comes out in a month or so, the sort-bed tool will include the functionality in bbms and be able to do sorts on arbitrarily large BED inputs.)

Please see: http://code.google.com/p/bedops/wiki/vcf2bed for conversion, http://code.google.com/p/bedops/wiki/sortBed for sorting, and http://code.google.com/p/bedops/wiki/bedops for documentation for the bedops tool.

The --element-of operator is probably most useful for reporting overlapping BED elements, while the --merge operator will concatenate overlapping regions. You can combine operators, if this is needed for your analysis, by using standard UNIX piping; BEDOPS apps can usually take in standard input from upstream processing, e.g.

vcf2bed < foo.vcf | bbms - | bedops --element-of - bar.bed > answer.bed
ADD REPLY
0
Entering edit mode

Thanks for the explanation! I'll surely give it a try

ADD REPLY

Login before adding your answer.

Traffic: 1710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6