Approaches for SV calling from De Novo assembly
2
3
Entering edit mode
8.8 years ago
novice ★ 1.1k

Given a de novo assembly and a reference assembly, what methods have you tried / would you recommend for determining structural variations?

SV Assembly • 3.0k views
ADD COMMENT
6
Entering edit mode
8.5 years ago

For this paper An integrated map of structural variation in 2,504 human genomes my tiny part was to validate complex structural variation in long read TruSeq data.

It was about 3 years ago so I'm not sure if there are better approaches. But we created breakpoint contigs across putative SV breakpoints using Velvet.

Then I took the breakpoint contigs (there are many possible ones generated by Velvet) and I used BLAT to align them to the reference genome.

Using the BLAT results I was able to parse out the precise breakpoints.

Like I said it's rather labor intensive and I'm sure there's a better way of doing it. But this might be a good lead!

ADD COMMENT
3
Entering edit mode

Thank you for the answer. I really appreciate the detailed supplementary paper accompanying your paper. However, it doesn't seem to go into how BLAT was used. Could you please explain how you inferred breakpoints from the BLAT alignment?

It's been a long time and I have already used a combination of different methods for my purpose (similar to those of your collaborators), but I'm definitely interested in learning your method.

ADD REPLY
3
Entering edit mode

I attached this visual aid. BLAT alignments

For the contig you generated across a breakpoint, you align it to the reference genome and seek alignments with high percent identity. You expect the sequence to match nearly to 100%.

In this example the deletion on the right is evident since the break point contig aligns with two noncontinuous parts. The number of base pairs between the last aligned base pairs for each aligned segment is the size of the deletion.

Using command line BLAT (download from UCSC genome browser under Tools) will give you output that makes parsing alignments easy.

ADD REPLY
3
Entering edit mode

Brilliant. Thank you for the explanation.

Quick question: could you use BLAST instead of BLAT? I'm wondering if there's a specific reason you choose BLAT.

ADD REPLY
2
Entering edit mode

I think BLAT works better for short sequences? Also you can download a command line version of BLAT. My PI uses it for primers, but I don't see why no BLAST.

I think BLAT is faster too, no?

Also BLAT output from the command line has the number of aligned segments. Anything equal to 1 is not a SV. Greater than 2 indicates a complex SV (DUP-INV-DUP, DEL-DUP, etc.) I found it really informative after writing a script able to parse the BLAT output (assuming you have a lot of breakpoints to test)

ADD REPLY
0
Entering edit mode

For my work (yeast) BLAST is actually much faster for some reason. But I didn't know that about alignment segments. When I tried BLAT out, I just formatted the output like BLAST (-out=blast8). Interesting!

ADD REPLY
0
Entering edit mode
4.8 years ago
Manish ▴ 10

Probably too late to answer, but we developed SyRI which identifies structural differences between two assemblies. It identifies structural rearrangements (inversions, transpositions, translocations, segmental (distal) duplication, tandem duplication) between assemblies. It also identifies syntenic (conserved) regions, as well as local variations (SNPs, indels, CNVs) in both rearranged and conserved regions to provide a hierarchy of variations. You can read more SyRI here and download the method from Github.

ADD COMMENT
1
Entering edit mode

Hi Manish,

I suggest you make a Tool post about your tool, which should then include a description, use-cases and maybe some example code unless there is an extensive manual on Github that you can link. This is probably better to make people aware of your tool than refreshing years-old threads.

ADD REPLY

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6