Hi all,
I have sequenced the mRNA of a heterologous library expressed in human cells using nanopore. Then I mapped the reads from the fastq files to the reference database of the library using the minimap.
Then I filtered the SAM file to retain the primary reads with high mapping quality. After this, I visualised the BAM file in IGV:
As you can see there are some reads having mismatches to the reference database. I checked even the highest mapping quality reads are having mismatches. While I am guessing that these reads are mapping to more than one variant in the library dataset, I am looking for a command line that I could use to discard such alignments from SAM/BAM file. Unlike the BWA aligner, the minimap does not generate the MD (Mismatching positions/bases) & XM (Number of mismatches in the alignment) flags in the SAM file. It generates the NM:i flag that informs about the mismatches and INDELS together, so this is not really helpful as I am just looking to discard alignments with mismatches.
I would really appreciate the help to discard alignments with mismatches from SAM/BAM file (better if I can also give a mismatch threshold). Thanks alot.
Also, this is cross posted on the minimap2 github repo. Please don't cross-post without reference.
Hi Rob,
Thanks, I will take care of it next time.
Hi Geno,
could you please help me with a command line that I can use with MD tag to retain perfect matches? Sorry, I am new to data analysis. Many thanks.