Query Two Files To The Program Fasta (Performs Pairwise Alignments)
2
0
Entering edit mode
12.0 years ago
martinkara7 ▴ 20

Hello, I am rather very new to programming and am trying to design a perl script. I have two very large fasta files(one file conain 4000 sequences and the other 3500). I want to conduct a pairwise alignment on the the files with every possible combination. I am using the software FASTA to conduct the pairwise so i can tailor the outfile. The thing is query the file against each other to do this. I am having great difficult with the loops and submit to FASTA part. Below is how I am trying to do it but the loops are being very difficult. Could any one advise on this or offer alternative idea i could try please

Open file 1
While reading file 1
  Extract 1st sequence to align
  Open file 2
  While reading file 2
    Extract seq from file2
     submit to FASTA
  Close file 2
Close file 1
 system call FASTA
fasta alignment • 3.3k views
ADD COMMENT
0
Entering edit mode

Why are the loops difficult to you? Post your code to see if it could be improved.

Do you need reciprocal combinations? I mean, if you already compared A vs B, do you need B vs A too?

Why do you need to do one by one, it's simple to use a multifasta as target and query, then parse the output to obtain the pairs information.

ADD REPLY
0
Entering edit mode

I think you should add this as an answer. I suspect the OP was not aware that one could use the multifasta for both target and query.

ADD REPLY
0
Entering edit mode

I was expecting some additional info, but I wil add that part as answer.

ADD REPLY
5
Entering edit mode
12.0 years ago

You can do the whole thing (align all possible non-redundant pairs) in Python (assuming you have BioPython installed)::

from itertools import product
from Bio import SeqIO
from Bio import pairwise2

seqs1 = SeqIO.to_dict(SeqIO.parse(open('file1.fasta'),'fasta'))
seqs2 = SeqIO.to_dict(SeqIO.parse(open('file2.fasta'),'fasta'))

for sr1, sr2 in product(seqs1,seqs2):
    for a in pairwise2.align.localxx(str(seqs1[sr1].seq), str(seqs2[sr2].seq)):
        print format_alignment(*a)
ADD COMMENT
1
Entering edit mode
12.0 years ago
JC 13k

Why do you need to do one by one, it's simple to use a multifasta as target and query, then parse the output to obtain the pairs information.

ADD COMMENT

Login before adding your answer.

Traffic: 2382 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6