I made a post previously (How to loop on two fasta files for hetero-dimerization analysis) but I think I could not explain my problem clearly.
Let's assume I have two fast files as below:
file1.fa
>seq20
NNNNNNN
>seq11
NNACCGGNN
>seq2
NNNNNNNN
file2.fa
>seq3
AAACCGGTAAA
>seq9
GGGGGGGGGGG
I want to make a loop using two files - for each sequence in file1 the loop should go through all the sequences in file2.
What I want to do is to start with the first sequence in file1 - make 5mers - save the 5mer's reverse complements in a list and look for any item in the list if exists in ANY sequence in file2. Then loop goes to the second sequence in file2 and makes a NEW list (not adding up to the previous list).
Based on the dummy sequences - one of the possible 5mers from the seq11 in the file1.fa is ACCGG
which it's reverse complement becomes (CCGGT)
- if you look at the seq3
in the file2 you could see that CCGGT
exists in that sequence then these sequence ids should be reported.
Base on my examples only seq11 hetero-dimerizes with seq3 so the desired output should look like a table as below:
file2 seq3 seq9
file1
seq20 0 0
seq11 1 0
seq2 0 0
I have written this script but it does not give any output - also I have not been able to set the rest of the script to make the output in a table format. Any help is appreciated - Many thanks.
from Bio import SeqIO
with open('test1.fa', 'r') as fw, open('test2.fa', 'r') as rv:
for fw_record in SeqIO.parse(fw, 'fasta'):
lst = []
for i in range(len(fw_record.seq)):
kmer = str(fw_record.seq[i:i + 5])
if len(kmer) == 5:
C_kmer = Seq(kmer).complement()
lst.append(C_kmer[::-1])
print(lst)
for rv_record in SeqIO.parse(rv, 'fasta'):
if any(items in rv_record.seq for items in lst):
print(fw_record.id)
print(fw_record.seq)
print(rv_record.id)
print(rv_record.seq)
Please stop asking the same question over and over again.