Entering edit mode
2.4 years ago
João
▴
10
So I have a reference DNA sequence, and I have many query sequences. I want to do a pair-wise alignment of every query sequence, one at a time, to the reference sequence.
And I am looking for the best way to get, as output, the starting and ending positions (of the reference sequence) of where the alignment was done.
I have tried using some R functions to do the pair-wise alignment, but I haven't found any that tells me the alignment positions in relation to the reference DNA sequence.
Does anyone know a good way to do this?
Thank you very much in advance!
Please post example input and expected output.
Have you tried doing a local blast? From this page - if you perform local blast with
-outmft 6
you could get the coordinates of the position to where the query sequence has aligned to the reference. If you exclusively want to do this in R then you could also give internal system calls to blast from R (for the purpose of automating this over all the query sequences):Thanks for the answer!
How long are your query sequences in relation to your reference, and do you expect gapped alignments?
Depending on this, you can use general aligners like minimap2, transcriptome aligners like exonerate or whole-genome aligners like MUMmer.
Personally, I'd probably start with cactus or GSAlign unless your sequences are very short.
Thanks for the answer! My query sequences are from 60bp to 300bp, and the reference sequences are around 950bp. They are fish 12S sequences, so they usually introduce a lot of gaps in the alignment
60 to 300bp are well a range to use any regular short read aligner capable of gapped alignments such as bbmap or STAR. You will need to try out a few parameter combinations depending on your exact data properties: