Question

Pairwise sequence alignment

0

Entering edit mode

7.7 years ago

jitendoshi2006 • 0

Could anyone suggest me an easiest way to perform pairwise sequence alignment wherein my query sequence is the sequencing reads obtained from NGS and the subject sequence is a constant part expected to be found on the reads. So, i need to have a score and length of alignment as parameters to screen and filter. I would like to execute Smith-Waterman algorithm. I tried using python but it is getting complicated for me to find EMBOSS suite (which I finally downloaded) but then cannot execute it from python (tried WaterCommandline). And I am not even sure that this will give me a score and alignment length if I successfully manage to execute it. I am looking up in R now to see if it is possible and reiterate all the reads against a subject sequence. Then, I would like to splice the query sequence and store it in a file in a fasta format following the sequence alignment. Any ideas on how to conveniently do this will be appreciated.

alignment • 4.0k views

ADD COMMENT • link updated 7.7 years ago by ALchEmiXt ★ 1.9k • written 7.7 years ago by jitendoshi2006 • 0

0

Entering edit mode

It is unclear what you are exactly trying to do? Calling SNP? Instead of doing pairwise alignments one option could be to do NGS alignments as usual and then pull the reads out in the region you are interested in followed by converting them to fasta format and then do a multiple sequence alignment (MSA).

ADD REPLY • link 7.7 years ago by GenoMax 147k

0

Entering edit mode

Firstly, I am searching for the sequence (subject) that I know is going to be present in the reads. I am interested in the reads that possess the subject sequence. I thought the best way was to do pairwise sequence alignment but I do not find a good platform for for performing these alignments. I also tried bio.pairwise2 in python but it does not align well. I hope i explained it well. What do you mean by NGS alignments? Are there any programs to do so?

Then, I remove the subject sequence part from the reads. Additionally, I would like to align the remaining part of reads to each other so that the redundant reads can be counted as '1' and then align the sorted reads to the genome. I plan to use BLAT/Bowtie for genome alignment

ADD REPLY • link 7.7 years ago by jitendoshi2006 • 0

2

Entering edit mode

What I meant was alignment with a regular NGS data aligner like bbmap, bwa, bowtie2 etc. These are always going to be very efficient aligning millions of reads rather than blast other similarity search techniques. Once you have the alignment to "genome" (which could be your reduced region of interest though you may get some misalignments) you can extract reads that aligned to the region of interest.

Before you do that it sounds to me like you are expecting duplication/redundancy in this dataset. Actually clumpify.sh from BBMap suite may be the best way to handle that case. Take a look at this thread for the various use cases for clumpify: A: Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files

ADD REPLY • link 7.7 years ago by GenoMax 147k

score 1 · Answer 1 · 2017-02-28

1

Entering edit mode

7.7 years ago

fishgolden ▴ 520

I think the easiest way to perform Smith-Waterman is SSEARCH in FASTA package. Please check. https://github.com/wrpearson/fasta36/releases

ADD COMMENT • link 7.7 years ago by fishgolden ▴ 520

score 1 · Answer 2 · 2017-02-28

1

Entering edit mode

7.7 years ago

ALchEmiXt ★ 1.9k

MUMmer is fast and fit for purpose. http://mummer.sourceforge.net/

ADD COMMENT • link 7.7 years ago by ALchEmiXt ★ 1.9k

score 0 · Answer 3 · 2017-02-28

Hi,

I used blast+ for a similar task and it worked fairly well. You'll need to convert the .bam/.sam file to a fasta file first but the results in tabulated format are really easy to filter based on score or alignment length.

Not sure to understand what you mean by "splicing" the query.

Best,

Carlo