Entering edit mode
6.7 years ago
BioGeek
▴
170
I want to focus on all those reads that connect two or more contigs. How to extract ONLY those nanopore reads ?
I want to focus on all those reads that connect two or more contigs. How to extract ONLY those nanopore reads ?
I wanted to do the same and this turned out to be actually much harder than expected (because the SAM format is quite cryptic I think). I chose the simpler option which is : mapping with minimap 2 (no need to use the option splice) extract reads ids :
#!/usr/bin/env python3 import sys samfile = open(sys.argv[1], 'r') read_dic = {} for line in samfile: line=line.rstrip('\n') tab_line = line.split("\t") read=tab_line[0] contig=tab_line[2] if read not in read_dic: read_dic[read]=list() read_dic[read].append(contig) else : read_dic[read].append(contig) for i in read_dic : if len(list(set(read_dic[i]))) == 2: print(i)
extract with seqtk subseq reads list and map again then visualize
I'm sure there is a more elegant option but in the mid time this was useful for me
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Did you map your reads? How?
mapping is done by minimap2