I am wanting to align reads to a database of reference sequences. This database contains thousands of reference sequences. I would like to only see the reference sequences that the reads cover a certain percentage, eg. only see the reference sequence if reads cover >50% of it.
Currently, I am doing my alignment with BWA MEM, and the output is really hard to analyse, because I'll have hundreds of reference sequences with reads aligning to them, but 80% of these only have reads aligning to one little spot of the reference sequence, which doesn't really make it a match.
Is there an alignment program that will allow me align reads to multiple sequences, and then sort results by the amount of the reference sequence that is covered? Or is there a way that I can this information out of a SAM or BAM file?
I'm sorry if this question has been answered before but I couldn't find a post that answered this for me.
Thank you for your help.
that is information you could retrieve from a blast search but if you are working with really big sets it might not be the most efficient approach. MegaBlast might already be better suited.