How can I select reads Based on the best MAPQ and exclude reads with the same name and MAPQ value?
2
0
Entering edit mode
8.4 years ago
jleandroj • 0

I have a sam file. I want to select the reads with the best MAPQ (mapping quality).

Firstly, for example, I can have three reads with the same name but these have different MAPQ value, therefore I want to exclude the two reads with less MAPQ value and select the read with the best MAPQ.

Second, if there are three reads with the same name and the same number of MAPQ I want to exclude these three reads.

Thanks

sequence • 2.0k views
ADD COMMENT
1
Entering edit mode
8.4 years ago

For the first one you can more simply exclude secondary alignments (bit 256 in the flag, see -f and -F).

For the second requirement, you'll either need to know the range of possible MAPQ values where this can occur for the aligner you're using and exclude them or write a script to do this (e.g., in python with pysam).

ADD COMMENT
0
Entering edit mode

Thanks Devon Can I do it in shell (awk, perl)? Are there any way to do it using samtools?

ADD REPLY
0
Entering edit mode

For the second requirement you can undoubtedly do that with perl. If the file is name sorted then you might be able to put something together with awk, but it'd be more trouble than it's worth. Stick to perl if that's what you know (or play with the code that Pierre posted if you're OK with javascript).

ADD REPLY
1
Entering edit mode
8.4 years ago

For fun, using nashorn (the java-based javascript engine) and htsjdk:

(not tested, It needs real data... )

ADD COMMENT

Login before adding your answer.

Traffic: 1876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6