How to extract all ONT reads that connect/overlaps two or more contigs ?
1
1
Entering edit mode
6.7 years ago
BioGeek ▴ 170

I want to focus on all those reads that connect two or more contigs. How to extract ONLY those nanopore reads ?

overlaps nanopore reads scaffolding • 1.9k views
ADD COMMENT
0
Entering edit mode

Did you map your reads? How?

ADD REPLY
0
Entering edit mode

mapping is done by minimap2

ADD REPLY
0
Entering edit mode
4.9 years ago
lagartija ▴ 160

I wanted to do the same and this turned out to be actually much harder than expected (because the SAM format is quite cryptic I think). I chose the simpler option which is : mapping with minimap 2 (no need to use the option splice) extract reads ids :

 
 #!/usr/bin/env python3

import sys
samfile =  open(sys.argv[1], 'r')
read_dic = {}

for line in samfile: 
 line=line.rstrip('\n')
 tab_line = line.split("\t")
 read=tab_line[0]
 contig=tab_line[2]

 if read not in read_dic:
     read_dic[read]=list()
     read_dic[read].append(contig)
 else :
     read_dic[read].append(contig)

for i in read_dic :

 if len(list(set(read_dic[i]))) == 2:
     print(i)

extract with seqtk subseq reads list and map again then visualize

I'm sure there is a more elegant option but in the mid time this was useful for me

ADD COMMENT

Login before adding your answer.

Traffic: 2879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6