Question

How can I get the positions where a sequence was aligned as output of a pair-wise alignment?

0

Entering edit mode

2.7 years ago

João ▴ 10

So I have a reference DNA sequence, and I have many query sequences. I want to do a pair-wise alignment of every query sequence, one at a time, to the reference sequence.

And I am looking for the best way to get, as output, the starting and ending positions (of the reference sequence) of where the alignment was done.

I have tried using some R functions to do the pair-wise alignment, but I haven't found any that tells me the alignment positions in relation to the reference DNA sequence.

Does anyone know a good way to do this?

Thank you very much in advance!

R pair-wise alignment • 1.6k views

ADD COMMENT • link updated 2.7 years ago by Matthias Zepper 5.1k • written 2.7 years ago by João ▴ 10

0

Entering edit mode

Please post example input and expected output.

ADD REPLY • link 2.7 years ago by cpad0112 21k

0

Entering edit mode

Have you tried doing a local blast? From this page - if you perform local blast with -outmft 6 you could get the coordinates of the position to where the query sequence has aligned to the reference. If you exclusively want to do this in R then you could also give internal system calls to blast from R (for the purpose of automating this over all the query sequences):

blastn_command = paste("blastn  -query genes.fasta  -subject genome.fasta  -outfmt 6 > out")
system(blastn_command)

ADD REPLY • link 2.7 years ago by manaswwm ▴ 560

0

Entering edit mode

Thanks for the answer!

ADD REPLY • link 2.7 years ago by João ▴ 10

0

Entering edit mode

How long are your query sequences in relation to your reference, and do you expect gapped alignments?

Depending on this, you can use general aligners like minimap2, transcriptome aligners like exonerate or whole-genome aligners like MUMmer.

Personally, I'd probably start with cactus or GSAlign unless your sequences are very short.

ADD REPLY • link 2.7 years ago by Matthias Zepper 5.1k

0

Entering edit mode

Thanks for the answer! My query sequences are from 60bp to 300bp, and the reference sequences are around 950bp. They are fish 12S sequences, so they usually introduce a lot of gaps in the alignment

ADD REPLY • link 2.7 years ago by João ▴ 10

0

Entering edit mode

60 to 300bp are well a range to use any regular short read aligner capable of gapped alignments such as bbmap or STAR. You will need to try out a few parameter combinations depending on your exact data properties:

bbmap.sh in=query.fa ref=reference.fa nodisk local=t minid=0.4 minhits=2 slow

ADD REPLY • link 2.7 years ago by Matthias Zepper 5.1k