Randomize CDS While Maintaining Amino Acid Sequence
1
0
Entering edit mode
9.9 years ago
sheinsch ▴ 10

I am trying to remove promoter recognition sites and transcription factor binding sites from several coding sequences. Is there a tool that will randomize the nucleotide sequence while maintaining the amino acid sequence?

EDIT:

I will be expressing the proteins under a variety of promoters. What I am trying to do now is remove any sites within the CDS that could potentially bind transcription factors or RNA polymerase.

gene • 2.0k views
ADD COMMENT
0
Entering edit mode

Why do you want to randomize the nucleotide sequences in the protein coding region when you want to remove the promoter recognition sites and TF binding sites which is usually located up-stream of the protein coding region?

ADD REPLY
0
Entering edit mode
9.9 years ago
Sam ★ 4.8k

I played around with perl and this should give you a randomized sequence each time:

#!/usr/bin/perl
use strict;
use warnings;
my $num_args = $#ARGV + 1;
if ($num_args != 2) {
    print "\nUsage: aminoRand.pl <Codon Table File> <nucleotide sequence>\n";
    exit;
}

open CODON, $ARGV[0] or die $!;

my %codon = ();
my %translate = ();
while (<CODON>) {
  chomp;
  if ( /^\s*$/ ) { 
  }else{
      my @list = split( /\s+/, $_);
      my $key = $list[0];
      my @codes = @list;
      @codes = splice @codes, 1, @codes;
      $codon{$key} = \@codes;
      for(my $i = 1; $i < $#list+1; ++$i){
        $translate{$list[$i]} = $key;
      }
  }
}
close(CODON);
chomp($ARGV[1]);
my $length = length($ARGV[1]);
for(my $i = 0; $i < $length; $i=$i+3){
    my $current = substr $ARGV[1], $i, 3;
    if(exists $translate{$current}){
        my $newKey=$translate{$current};
        if(exists $codon{$newKey}){
        my @possible = @{$codon{$newKey}};
        print($possible[rand @possible]);
        }
        else{
            print "Cannot find in codon: $newKey\n";
        }
    }
    else{
        print "Can't find: $current\n";
    }
}
print("\n");

You will need to provide a codon file of the following format:

I  ATT  ATC  ATA
L  CTT  CTC  CTA  CTG  TTA  TTG

and then the sequence. Then it will randomly generate a sequence that will produce the same amino acid sequence but different neucleotide

ADD COMMENT

Login before adding your answer.

Traffic: 1291 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6