Let's consider these values
If you want to understand better you can access this link: http://codonw.sourceforge.net/DataRecoding.html
how can I solve this problem?
Let's consider these values
If you want to understand better you can access this link: http://codonw.sourceforge.net/DataRecoding.html
how can I solve this problem?
That equation is a custom hash function that someone came up with to ensure that codons hash to unique sequential values for memory efficiency. In a modern language like python there are built in hash functions, so one could instead just do:
someSequence = 'atgatg'
d = dict()
for idx in xrange(0, len(someSequence), 3):
codon = someSequence[idx:idx+3].upper()
if codon not in d:
d[codon] = 0
d[codon] += 1
for codon, cnt in d.items():
print('{}: {}'.format(codon, cnt))
If for some reason you absolutely HAD to use this custom hashing function, then you have to use a vector of values:
valueConversion = {'T': 1, 'U': 1, 'C': 2, 'A': 3, 'G': 4}
def customHash(codon):
P1 = valueConversion[codon[0].upper()]
P2 = valueConversion[codon[1].upper()]
P3 = valueConversion[codon[2].upper()]
return ((P1 - 1)*16) + P2 + ((P3-1)*4) - 1 # Note the conversion to 0-based indexing!
someSequence = 'atgatg'
v = [0] * 64 # N.B., python uses 0-based indexing
for idx in xrange(0, len(someSequence), 3):
codon = someSequence[idx:idx+3].upper()
v[customHash] += 1
Now v
contains the counts starting from AAA
to GGG
. There are probably some typos in there. Finding and correcting these errors can be an exercise for you.
As Devon says, you don't need that equation to count exons. If the fasta sequence is serialized, you can just use this perl oneliner:
echo "atgatggtagtacatcatcat" | perl -lne '$cs{$1}++ while /(...)/g;
END { foreach $c (sort keys %cs) { print uc($c).": $cs{$c}" } }'
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I have some strings that have been grouped with a linux cat. And I would like to assign this cat file to that code and thus generate several frequencies according to the sequences.
Further help comes at a cost of 500 euro per hour (or fraction thereof). I imagine you'll want to code the last bit yourself...
ok. Thanks! :)
Did you check my answer? [sigh]
It is:
What do these errors mean? I can't see the exit/output...
What I wrote in my comment was exacly what you could read on your screen, so the
$
and the>
characters at the beggining of each line are not meant to be copied and pasted. If you want to do so, then you have to copy and paste this:Thanks! ;)
Can you give me a hint of what I'm doing wrong? Or what can I do or study to fix it? What should I learn?