I am trying to filter out mitochondrial-specific nanopore reads from whole genome reads using minimap2 and samtools view. In addition I want to retain reads with an alignment score of 5000 or more, following a methodology suggested on a paper. However I don't know what alignment score is and how to adjust it to 5000 to achieve this goal and at what stage. I would appreciate to have your thoughts. Also, I would believe alignment score is different from mapping quality? (samtools view -q) Thanks.
So far my commands go like this:
minimap2 -ax map-ont -t 24 mito_genome.fa nanopore_reads.fastq.gz > mito.sam
samtools view -F 4 mito.sam > mito_mapped.sam
Minimap2 reports mapping qualities which should be no more than 60. I don't think this is the same as the alignment score you mentioned. If you want to follow the methodology suggested on a paper, you need to find out how to calculate the alignment score from that paper and filter the reads accordingly.
Right, mapping qualities (MAPQ) are different than alignment scores (AS). In my sam file I found that the AS values are located in the 14th field (e.g. AS:i:5228). In my case I want to retain reads with an AS of 5000 or more, I would appreciate if someone could share a script that would allow me to do this filtering based on AS. Thanks.
Big warning - align all reads - WGS - to the whole genome, otherwise you might be forcing reads from other locations to map to the mito, which might cause false positives. This shouldn't be too bad with long reads but can cause big issues with short reads.
Thanks for your observation. Do you mean mapping the reads to the nuclear genome sequence first, and then filtering out the mito reads from the unmapped reads? To note, there is no whole genome reference available for the species I am working on yet.
Did you find out what alignment score means? Did you run Qiongyi's script?