Hi all,
I extracted some reads from a region (~200kb) of interest (including the unmapped ones) and all the pairs of unmapped reads from my WGS BAM: I'd like to know if it is possible to automatically (using vg?) detect any structural variant using the spades assembler (http://cab.spbu.ru/files/release3.12.0/manual.html ) for this region of interest .
The output directory looks like:
assembly_graph_after_simplification.gfa
assembly_graph.fastg
assembly_graph_with_scaffolds.gfa
before_rr.fasta
contigs.fasta
contigs.paths
corrected
dataset.info
input_dataset.yaml
K21
K33
K55
K77
misc
params.txt
pipeline_state
run_spades.sh
run_spades.yaml
scaffolds.fasta
scaffolds.paths
spades.log
tmp
is there any way to automatically detect any large inversion/insertion/deletion/etc.. from the assembly graph ?
Do you want to identify SVs by comparing your assembly with the reference genome?
yes. I wonder if the unmapped reads can give some new insights compared to the usual algorithms (manta, etc...) .
I don't know what you already tried, but Assemblytics was developed for that purpose, should work but is not very recent. Might be okay for the length of contigs you have, but more problematic with contigs from long read sequencing. Other alternatives are SVIM-asm (which I haven't tried, but SVIM is good) and dipcall (which I know does only identify SVs span by an alignment). Please let me know what works for you :)
very interesting, I'll try it tomorrow