Hetero-dimerization analysis with two FASTA files
1
0
Entering edit mode
3.7 years ago
Apex92 ▴ 300

I made a post previously (How to loop on two fasta files for hetero-dimerization analysis) but I think I could not explain my problem clearly.

Let's assume I have two fast files as below:

file1.fa

>seq20
NNNNNNN
>seq11
NNACCGGNN
>seq2
NNNNNNNN

file2.fa

>seq3
AAACCGGTAAA
>seq9
GGGGGGGGGGG

I want to make a loop using two files - for each sequence in file1 the loop should go through all the sequences in file2.

What I want to do is to start with the first sequence in file1 - make 5mers - save the 5mer's reverse complements in a list and look for any item in the list if exists in ANY sequence in file2. Then loop goes to the second sequence in file2 and makes a NEW list (not adding up to the previous list).

Based on the dummy sequences - one of the possible 5mers from the seq11 in the file1.fa is ACCGG which it's reverse complement becomes (CCGGT) - if you look at the seq3 in the file2 you could see that CCGGT exists in that sequence then these sequence ids should be reported.

Base on my examples only seq11 hetero-dimerizes with seq3 so the desired output should look like a table as below:

     file2     seq3     seq9
file1 
seq20          0         0
seq11          1         0
seq2           0         0

I have written this script but it does not give any output - also I have not been able to set the rest of the script to make the output in a table format. Any help is appreciated - Many thanks.

from Bio import SeqIO

with open('test1.fa', 'r') as fw, open('test2.fa', 'r') as rv:
    for fw_record in SeqIO.parse(fw, 'fasta'):
        lst = []
        for i in range(len(fw_record.seq)):
            kmer = str(fw_record.seq[i:i + 5])
            if len(kmer) == 5:
                C_kmer = Seq(kmer).complement()
                lst.append(C_kmer[::-1])
        print(lst)

        for rv_record in SeqIO.parse(rv, 'fasta'):
            if any(items in rv_record.seq for items in lst):
                print(fw_record.id)
                print(fw_record.seq)
                print(rv_record.id)
                print(rv_record.seq)
sequence FASTA python awk bash • 867 views
ADD COMMENT
0
Entering edit mode

Please stop asking the same question over and over again.

ADD REPLY
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 1974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6