Question

Find breakpoints using long reads

0

Entering edit mode

2.1 years ago

kirillkirilenko ▴ 40

Hello everyone! I want to determine the precise positions of breakpoints in sp1 (assembled species). I have a number of long nanopore .fastq reads from sp2 (unassembled species). The species sp1 and sp2 are closely related. I am aware of the breakpoints' approximative coordinates (coord2-coord1 ≈ 1Mb, coord4-coord3 ≈ 1Mb). (View the image.)

I adopted the following strategy: I cut left and right regions and aligned to these .fasta files long nanopore reads separately. I thought that there should only have been a few long reads that both alignments shared. And how I believed that there are breakpoints in these reads. But I discovered that these files have about 40k common reads.

Maybe someone has a better idea (tools) or could improve mine! I appreciate it.

enter image description here

alignment nanopore samtools • 679 views

ADD COMMENT • link 2.1 years ago by kirillkirilenko ▴ 40

score 0 · Answer 1 · 2022-10-03

If I understand correctly, you want to do what is called "structural variant calling". You can align reads of sp2 to the genome of sp1 and then use Sniffles (https://github.com/fritzsedlazeck/Sniffles). Sniffles will give you a list of structural differences between these two genomes.

An even more simple strategy is to align reads and then visualise the alignment in a program like Tablet (https://ics.hutton.ac.uk/tablet/). You will observe the breakpoints by eye.