Hello everybody,
Using Python, I'm trying to create 30-mers from DNA sequences in a fasta file in order to run a local BLAST analysis on it later. So the end goal is to have a list that contains many lists of these 30-mer sequences that I can turn into a fasta file. To do this, I need to maintain all the fasta information throughout the subsequent code, but when I try to run it, I get an error that states "TypeError: unhashable type: 'SeqRecord''.
The weird thing is that when run the code on my home computer, it handles the code just fine. It's only when I run it on a ssh that links to a Linux server that it gives me this error. Both use Biopython 1.67, but the ssh utilizes Python 3.5 while my home computer runs Python 2.7.
For practical reasons, I cannot run this code on my home computer, so I need to find a way to circumvent or fix this error.
Here is the code that I used and the error message that pops up when running the code from ssh. Thanks!
from Bio import SeqIO
def find_kmers(string, k):
kmers = []
n = len(string)
for i in range(0, n-k+1):
kmers.append(string[i:i+k])
return list(set(kmers))
ltr_seq = SeqIO.parse(open('HIV_Align_5\'LTR_no_gaps_nor_high_gaps.fasta'), "fasta")
all_kmer_list = []
for i in ltr_seq:
all_kmer_list.append(find_kmers(i, 30))
File "HIV_5'LTR_30_mer_BLAST.py", line 24, in <module>
all_kmer_list.append(find_kmers(i, 30))
File "HIV_5'LTR_30_mer_BLAST.py", line 18, in find_kmers
return list(set(kmers))
TypeError: unhashable type: 'SeqRecord'
I would like to do that, but unfortunately, I need all the information in the 'ReqSeq' object to funnel the output to a fasta file. On my home computer, the 'ReqSeq' object would know that the function is being applied to the sequence found within the ReqSeq object (without the .seq extension) and maintains the rest of the fasta information while the sequence gets turned into various 30-mers
Oh, looks like the problem is in the
set()
call. Did'nt look long enough at the traceback earlier. Perhaps the implementation of creating a set is different between python3 and python2.7, if else I wouldn't know how to explain the problem.You want each kmer originating from the same sequence to have the same fasta identifier?
Just extract the fields you want to save into a tuple.
You're right! I modified the end of the find_kmer function and it worked wonderfully! Thank you :)