Question

Approaches for SV calling from De Novo assembly

3

Entering edit mode

9.1 years ago

novice ★ 1.1k

Given a de novo assembly and a reference assembly, what methods have you tried / would you recommend for determining structural variations?

SV Assembly • 3.3k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.1 years ago by novice ★ 1.1k

0

Entering edit mode

5.2 years ago

Manish ▴ 10

Probably too late to answer, but we developed SyRI which identifies structural differences between two assemblies. It identifies structural rearrangements (inversions, transpositions, translocations, segmental (distal) duplication, tandem duplication) between assemblies. It also identifies syntenic (conserved) regions, as well as local variations (SNPs, indels, CNVs) in both rearranged and conserved regions to provide a hierarchy of variations. You can read more SyRI here and download the method from Github.

ADD COMMENT • link 5.2 years ago by Manish ▴ 10

1

Entering edit mode

Hi Manish,

I suggest you make a Tool post about your tool, which should then include a description, use-cases and maybe some example code unless there is an extensive manual on Github that you can link. This is probably better to make people aware of your tool than refreshing years-old threads.

ADD REPLY • link 5.2 years ago by ATpoint 87k

Ram · Accepted Answer · 2016-05-10

6

Entering edit mode

8.9 years ago

QVINTVS_FABIVS_MAXIMVS ★ 2.6k

For this paper An integrated map of structural variation in 2,504 human genomes my tiny part was to validate complex structural variation in long read TruSeq data.

It was about 3 years ago so I'm not sure if there are better approaches. But we created breakpoint contigs across putative SV breakpoints using Velvet.

Then I took the breakpoint contigs (there are many possible ones generated by Velvet) and I used BLAT to align them to the reference genome.

Using the BLAT results I was able to parse out the precise breakpoints.

Like I said it's rather labor intensive and I'm sure there's a better way of doing it. But this might be a good lead!

ADD COMMENT • link 8.9 years ago by QVINTVS_FABIVS_MAXIMVS ★ 2.6k

3

Entering edit mode

Thank you for the answer. I really appreciate the detailed supplementary paper accompanying your paper. However, it doesn't seem to go into how BLAT was used. Could you please explain how you inferred breakpoints from the BLAT alignment?

It's been a long time and I have already used a combination of different methods for my purpose (similar to those of your collaborators), but I'm definitely interested in learning your method.

ADD REPLY • link 8.9 years ago by novice ★ 1.1k

3

Entering edit mode

I attached this visual aid. BLAT alignments

For the contig you generated across a breakpoint, you align it to the reference genome and seek alignments with high percent identity. You expect the sequence to match nearly to 100%.

In this example the deletion on the right is evident since the break point contig aligns with two noncontinuous parts. The number of base pairs between the last aligned base pairs for each aligned segment is the size of the deletion.

Using command line BLAT (download from UCSC genome browser under Tools) will give you output that makes parsing alignments easy.

ADD REPLY • link 8.9 years ago by QVINTVS_FABIVS_MAXIMVS ★ 2.6k

3

Entering edit mode

Brilliant. Thank you for the explanation.

Quick question: could you use BLAST instead of BLAT? I'm wondering if there's a specific reason you choose BLAT.

ADD REPLY • link 8.9 years ago by novice ★ 1.1k

2

Entering edit mode

I think BLAT works better for short sequences? Also you can download a command line version of BLAT. My PI uses it for primers, but I don't see why no BLAST.

I think BLAT is faster too, no?

Also BLAT output from the command line has the number of aligned segments. Anything equal to 1 is not a SV. Greater than 2 indicates a complex SV (DUP-INV-DUP, DEL-DUP, etc.) I found it really informative after writing a script able to parse the BLAT output (assuming you have a lot of breakpoints to test)

ADD REPLY • link updated 6.6 years ago by Ram 45k • written 8.9 years ago by QVINTVS_FABIVS_MAXIMVS ★ 2.6k

0

Entering edit mode

For my work (yeast) BLAST is actually much faster for some reason. But I didn't know that about alignment segments. When I tried BLAT out, I just formatted the output like BLAST (-out=blast8). Interesting!

ADD REPLY • link 8.8 years ago by novice ★ 1.1k