How can I get the positions where a sequence was aligned as output of a pair-wise alignment?
0
0
Entering edit mode
2.4 years ago
João ▴ 10

So I have a reference DNA sequence, and I have many query sequences. I want to do a pair-wise alignment of every query sequence, one at a time, to the reference sequence.

And I am looking for the best way to get, as output, the starting and ending positions (of the reference sequence) of where the alignment was done.

I have tried using some R functions to do the pair-wise alignment, but I haven't found any that tells me the alignment positions in relation to the reference DNA sequence.

Does anyone know a good way to do this?

Thank you very much in advance!

R pair-wise alignment • 1.4k views
ADD COMMENT
0
Entering edit mode

Please post example input and expected output.

ADD REPLY
0
Entering edit mode

Have you tried doing a local blast? From this page - if you perform local blast with -outmft 6 you could get the coordinates of the position to where the query sequence has aligned to the reference. If you exclusively want to do this in R then you could also give internal system calls to blast from R (for the purpose of automating this over all the query sequences):

blastn_command = paste("blastn  -query genes.fasta  -subject genome.fasta  -outfmt 6 > out")
system(blastn_command)
ADD REPLY
0
Entering edit mode

Thanks for the answer!

ADD REPLY
0
Entering edit mode

How long are your query sequences in relation to your reference, and do you expect gapped alignments?

Depending on this, you can use general aligners like minimap2, transcriptome aligners like exonerate or whole-genome aligners like MUMmer.

Personally, I'd probably start with cactus or GSAlign unless your sequences are very short.

ADD REPLY
0
Entering edit mode

Thanks for the answer! My query sequences are from 60bp to 300bp, and the reference sequences are around 950bp. They are fish 12S sequences, so they usually introduce a lot of gaps in the alignment

ADD REPLY
0
Entering edit mode

60 to 300bp are well a range to use any regular short read aligner capable of gapped alignments such as bbmap or STAR. You will need to try out a few parameter combinations depending on your exact data properties:

bbmap.sh in=query.fa ref=reference.fa nodisk local=t minid=0.4 minhits=2 slow 
ADD REPLY

Login before adding your answer.

Traffic: 1882 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6