Question

Identify insertions with IGV

1

Entering edit mode

6.8 years ago

marongiu.luigi ▴ 740

Dear all,

is it possible to visualize insertions in a sequence?

I have prepared a simulated sequence of the mitochondrial genome from the release hg38 by placing non human sequences right in the middle of it (position 8284). I then aligned the simulated genome to the mitochondrial index and the visualized the alignment with the integrated genome viewer IGV. However, I don't see any sign of insertions in the figure.

enter image description here

Is there a way to highlight the insertion point? Maybe by showing only clipped reads or the reads that map only on one mate?

Thank you.

IGV insertions visualization • 20k views

ADD COMMENT • link 6.8 years ago by marongiu.luigi ▴ 740

2

Entering edit mode

IGV is meant to visualize your alignment. It is not a variant caller. Appropriate tools for SV identification exist, e.g. lumpy

ADD REPLY • link 6.8 years ago by WouterDeCoster 48k

1

Entering edit mode

In IGV, insertions are represented with I. I can see a bunch of purple I in your snapshot. Please refer to: http://software.broadinstitute.org/software/igv/AlignmentData for more info.

Insertions
In a gapped alignment, IGV indicates insertions with respect to the reference with a purple I () or red I for  insertions greater than a user activated and specified cutoff.  Hover over the insertion symbol to view the inserted bases.

ADD REPLY • link 6.8 years ago by Sej Modha 5.3k

0

Entering edit mode

sorry, I forgot to mention that -- in order to differentiate the simulated sequence from the original -- I also generated random mutations in the sequence. The Is might simply who that. One might also argue that the coloured reads might mark the insertion point, but there are other regions with such colouring (not reported in the figure), so it is not a specific marker.

ADD REPLY • link 6.8 years ago by marongiu.luigi ▴ 740

0

Entering edit mode

sorry, I forgot to mention that -- in order to differentiate the simulated sequence from the original -- I also generated random mutations in the sequence. The Is might simply who that

So is this real data salted with simulated reads or just plain simulated reads?

ADD REPLY • link 6.8 years ago by GenoMax 151k

0

Entering edit mode

The procedure was this: I split the mitochondrial fasta file from grch38 into two pieces and merged the non human sequence in between. then I used EMBOSS to introduce random mutations and then ART to generate fastq pair mates. I then used BWA MEM to align to the mitochondrial index (prepared with BWA index for the original grch38 mitochondrial fasta).

ADD REPLY • link 6.8 years ago by marongiu.luigi ▴ 740

0

Entering edit mode

merged the non human sequence in between.

What was the length of this sequence? When you are referring to insertions are you referring to single bp or something longer like the actual size of the non-human sequence you inserted.

ADD REPLY • link 6.8 years ago by GenoMax 151k

0

Entering edit mode

I placed a stretch of 4000 bases from Parvovirus B19 after base 8284 of the mitochondrion, then introduced 500 mutattions with msbar.

ADD REPLY • link 6.8 years ago by marongiu.luigi ▴ 740

1

Entering edit mode

Take a look at the "Detecting structural variants" section on this IGV help page.

ADD REPLY • link 6.8 years ago by GenoMax 151k

0

Entering edit mode

The figure I get after colouring for the INSERT SIZE (and INSERT SIZE AND PAIR ORIENTATION) is this: enter image description here

With a bit of imagination, one could argue that there is a purple blob in the centre of the genome, where should be the insertion point. This is the enlargement: enter image description here would this be enough to say that IGV suggests a large insertion event?

ADD REPLY • link 6.8 years ago by marongiu.luigi ▴ 740

0

Entering edit mode

Depends on your context , if it's "somatic" insertion could be .... By the way your alignment is full of insertion ( first picture ) is it still simulated reads ?

ADD REPLY • link 6.8 years ago by Titus ▴ 910

0

Entering edit mode

yes. since there are 3 types of mutations in msbar (insertion, deletion, substitutions), there should be in theory 500/3 insertions points.

ADD REPLY • link 6.8 years ago by marongiu.luigi ▴ 740

0

Entering edit mode

You artificial insertion is too big to be picked by IGV, and also to big to affect insert size, as it is probably larger than the simulated insert size. In this scenario, what you would have is an increase of one mate mapped, the other unmapped, close to the insertion point. You could argue there is an insertion larger than your sequencing insert size, but without further data, you can't say how much larger.

ADD REPLY • link 6.8 years ago by h.mon 35k

1

Entering edit mode

Do you mean highlight the insertion point in the coverage bar ? So why don't you use another way to check insertion point ( based on coverage insertion rate with IGVtools or variant caller) and then check it on IGV ?

ADD REPLY • link 6.8 years ago by Titus ▴ 910

0

Entering edit mode

I thought IGV might show reads that have peculiar behaviour such as those with soft clips or a single mate mapped. If there are other tools, I will be happy to use them...

ADD REPLY • link 6.8 years ago by marongiu.luigi ▴ 740

0

Entering edit mode

You have to enable "show soft clipped bases" in IGV preferences.

ADD REPLY • link 6.8 years ago by GenoMax 151k

0

Entering edit mode

yes I did. The figure reports clipped reads included

ADD REPLY • link 6.8 years ago by marongiu.luigi ▴ 740

score 3 · Accepted Answer · 2018-08-08

I see evidence of the "transgene" insertion: all those identical soft-clipped bases centered at the position you inserted the non-human sequence. Pay attention: 1) all reads are soft-clipped at the same reference position, 2) as far as I can tell, all soft-clipped bases are identical between different reads.

Look at the picture below. The big red arrow indicates the insertion point, and the darkened rectangles indicate the inserted sequence (which I was able to determine as parvovirus by blasting them, even before you told us it was parvovirus).

However, keep in mind this visual inspection works well because you have a simple, small and with no duplications reference genome, and a simple and small insertion, without other copies of it throughout the reference genome. As WouterDeCoster pointed above, there are better methods to identify structural variation events in more complex scenarios.