Question

Inversion detection for very large (2 megabase pairs plus )inversions for mapping?

6

Entering edit mode

8.4 years ago

orlando.wong ▴ 60

Hello biostars,

I wanted to know if any of you has had any experience with software detecting very large inversion.

I have the .bam files from aligning illumina mate-pair on BWA-MEM, and I am looking for some analysis. Microarray was used to confirm four deletions 2 mbp - 8,mbp in size, and I wanted to see if there are any large inversions that result from the fragments in between the inversions.

Haven't had any luck with many of the structural detection software such as pindel, lumpy-sv, SV-Detect , and GASV, and delly. They have been able to detect around 70 - 100 bp inversions. Pindel has a maximum of 8 million bp and many detections.

Also wanted to ask if BWA-MEM or Bowtie mapping algorithms may have issues with mapping sequences with many large deletions (5 million base pairs) and possible inversions.

Thanks! *edited "insertion" when I meant to say inversion. Sorry for the confusion!

alignment sequence structural variation inversion • 3.8k views

ADD COMMENT • link updated 4.6 years ago by Biostar 20 • written 8.4 years ago by orlando.wong ▴ 60

score 6 · Answer 1 · 2016-07-18

LUMPY and Manta use discordant paired ends to determine SVs. Basically they are really good for small or large events that are not NAHR mediated.

NAHR SVs typically flank homologous sequence (see the H in NAHR). It's practically impossible to confidently map discordant paired-ends to these regions with short read data.

However you can easily find large CNVs with read coverage.

ForestSV is one method to do this, however the support for ForestSV is minimal and you need < R3.0 to run it.

CNVator is a tried and true CNV caller that uses read depth.

But that should get you in the right direction, you can a CNV caller that uses read depth to call CNVs.

If you know the positions of the CNV in question you can determine the coverage yourself and compare that coverage to another locus of similar size without a loss or gain in copy number.

This is an example published in my paper, which includes an overview of our structural variation calling pipeline. Here is a 128kb deletion. I plotted the normalized per base-pair depth of coverage obtained with the samtools depth command, which has been scaled for the chromosome.

The area in red has a normalized read depth of 0.5, which is half of the expected (one copy loss)

score 1 · Answer 2 · 2016-07-18

1

Entering edit mode

8.4 years ago

Charles Warden 8.3k

As a starting point, you can probably see such large deletions / duplications by visualizing coverage in IGV, assuming you have even coverage throughout the genome. Shouldn't be a problem with Whole Genome Sequencing. You'll see some increase in noise for Exome sequencing, but my main concern would be that you have to make sure you have a decent number of target genes within your region of interest if you have DNA-Seq from a smaller gene panel (say, a couple hundred genes).

If your sequencing method doesn't cover breakpoint regions, then methods like LUMPY won't work as well.

Have you tried using a program like VarScan, CoNIFER, or XHMM to call copy number changes?

There is also a possibility that the microarray results were false positives, if you had good genome coverage and you really don't see any indication of a CNV.

ADD COMMENT • link 8.4 years ago by Charles Warden 8.3k

0

Entering edit mode

Thanks for the tip Charles. There was a lot of noise when I visualized the .bed files on IGV. I used your idea to compute coverage using and was able to find low coverage in the four large deletion regions ( 2 mbp size each). The coverage in between the deletion regions seemed decent. Do you think this may suggest there is may not be a large inversion?

I looked at the programs you suggested. I don't think they were able to detect any inversions. I tried VarScan, but I am not sure if the algorithm is designed for inversion detection. I will continue to try the other two.

ADD REPLY • link 8.3 years ago by orlando.wong ▴ 60

0

Entering edit mode

I'm not sure how you were able to identify an inversion with microarray data, unless you had a custom probeset to cover the already known breakpoints (in which case, you should be able to PCR amplify the sequence spanning those junctions to confirm the accuracy in your sample).

Can't really visualize coverage to confirm inversions. You can check for clipped reads for large insertion, but that may also be hard to visualize.

ADD REPLY • link 8.3 years ago by Charles Warden 8.3k