How to use minimap2 with mappy within python script
1
0
Entering edit mode
5.3 years ago
Rox ★ 1.4k

Hello Biostar,

I'm trying to use minimap2 within a python script using it's python binding mappy ( https://github.com/lh3/minimap2/tree/master/python ).

A bit of context first, my ultimate goal with this script is to split a fastq into 4 files by Nanopore quality range (0-4 ,4-8, 8-12, 12+) and to align those sub-fastq to a reference and produce a bam file on each. Then work on those bam with samtools to calculate the error rate depending of each. I want to be able to compare raw Nanopore reads error rate accross different quality categories, and compare on different sequencing experiment depending on the basecallers used (flip-flop versus non flip-flop).

I've got the first part to split into sub-fastq. Now I'm trying to use mappy to produce a BAM output file. But from what I can see from the help, I feel like mappy was more designed to work on the alignement result "on the go" rather than producing a big output file.

This is the line (from usage example) that make me think of this :

for name, seq, qual in mp.fastx_read("test/MT-orang.fa"): # read a fasta/q sequence
     for hit in a.map(seq): # traverse alignments
     print("{}\t{}\t{}\t{}".format(hit.ctg, hit.r_st, hit.r_en, hit.cigar_str))

I can't find an example on how to just print the alignment result into a file...

Anyone can give me advices on this ?

Thank you !

Roxane

alignment nanopore • 4.1k views
ADD COMMENT
3
Entering edit mode
5.3 years ago
Rox ★ 1.4k

Well, figured it out myself !

In case anyone wonder, here is the portion of code I have made :

 #Build index is not existing or just load ref
aligner = mp.Aligner(ref, preset="map-ont")
logging.info("Loaded/built index for reference file.")
if not aligner:
    logging.error("Failed to load/build index")
    raise Exception("ERROR: failed to load/build index")
    sys.exit("ERROR: failed to load/build index")
#Calculate an error rate per aligned read
for name, seq, qual in mp.fastx_read(fastq_file):
    logging.info("Aligning sequence "+name+" : ")
    for hit in aligner.map(seq):
        #Count and store every occurence of a specific letter from the CIGAR string
        M = sum([int(i) for i in re.findall(r'(\d+)M',hit.cigar_str)])
...

You first have to import mappy as mp. Then you produce an aligner that contains several arguments, it is described on the mappy page, but I only used hit.cigar_str.

ADD COMMENT

Login before adding your answer.

Traffic: 2288 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6