How to count the number of codons in a frame or order using the equation?
2
0
Entering edit mode
4.1 years ago
USER • 0

Let's consider these values

If you want to understand better you can access this link: http://codonw.sourceforge.net/DataRecoding.html

how can I solve this problem?

biopython python code • 2.9k views
ADD COMMENT
1
Entering edit mode
4.1 years ago

That equation is a custom hash function that someone came up with to ensure that codons hash to unique sequential values for memory efficiency. In a modern language like python there are built in hash functions, so one could instead just do:

someSequence = 'atgatg'
d = dict()
for idx in xrange(0, len(someSequence), 3):
    codon = someSequence[idx:idx+3].upper()
    if codon not in d:
        d[codon] = 0
    d[codon] += 1

for codon, cnt in d.items():
    print('{}: {}'.format(codon, cnt))

If for some reason you absolutely HAD to use this custom hashing function, then you have to use a vector of values:

valueConversion = {'T': 1, 'U': 1, 'C': 2, 'A': 3, 'G': 4}

def customHash(codon):
    P1 = valueConversion[codon[0].upper()]
    P2 = valueConversion[codon[1].upper()]
    P3 = valueConversion[codon[2].upper()]
    return ((P1 - 1)*16) + P2 + ((P3-1)*4) - 1  # Note the conversion to 0-based indexing!

someSequence = 'atgatg'
v = [0] * 64  # N.B., python uses 0-based indexing
for idx in xrange(0, len(someSequence), 3):
    codon = someSequence[idx:idx+3].upper()
    v[customHash] += 1

Now v contains the counts starting from AAA to GGG. There are probably some typos in there. Finding and correcting these errors can be an exercise for you.

ADD COMMENT
0
Entering edit mode

I have some strings that have been grouped with a linux cat. And I would like to assign this cat file to that code and thus generate several frequencies according to the sequences.

ADD REPLY
2
Entering edit mode

Further help comes at a cost of 500 euro per hour (or fraction thereof). I imagine you'll want to code the last bit yourself...

ADD REPLY
0
Entering edit mode

ok. Thanks! :)

ADD REPLY
0
Entering edit mode

Did you check my answer? [sigh]

$ echo "atguuucccggggtataaggcaaaa" | perl -ne '$cs{$1}++ while /(...)/g;
> END { foreach $c (sort keys %cs) { $final .= "\"".uc($c)."\" = $cs{$c}, " }
> $final =~s/, $//; print "[$final]" }'
["AAA" = 1, "ATG" = 1, "CCC" = 1, "GGC" = 1, "GGG" = 1, "GTA" = 1, "TAA" = 1, "UUU" = 1]
ADD REPLY
0
Entering edit mode

It is:

syntax error at -e line 2, near ">"
syntax error at -e line 3, near ""[$final]" }"
Execution of -e aborted due to compilation errors.

What do these errors mean? I can't see the exit/output...

ADD REPLY
0
Entering edit mode

What I wrote in my comment was exacly what you could read on your screen, so the $ and the > characters at the beggining of each line are not meant to be copied and pasted. If you want to do so, then you have to copy and paste this:

echo "atguuucccggggtataaggcaaaa" | perl -ne '$cs{$1}++ while /(...)/g; END {
foreach $c (sort keys %cs) { $final .= "\"".uc($c)."\" = $cs{$c}, " }
$final =~ s/, $//; print "[$final]" }'
ADD REPLY
0
Entering edit mode

Thanks! ;)

ADD REPLY
0
Entering edit mode

Can you give me a hint of what I'm doing wrong? Or what can I do or study to fix it? What should I learn?

ADD REPLY
1
Entering edit mode
4.1 years ago

As Devon says, you don't need that equation to count exons. If the fasta sequence is serialized, you can just use this perl oneliner:

echo "atgatggtagtacatcatcat" | perl -lne '$cs{$1}++ while /(...)/g;
END { foreach $c (sort keys %cs) { print uc($c).": $cs{$c}"  } }'
ADD COMMENT

Login before adding your answer.

Traffic: 1032 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6