Rearranging Dna Sequences
9
1
Entering edit mode
14.7 years ago
Jess ▴ 10

I'm having trouble rearranging a DNA sequence. I need to rearrange randomly a given DNA sequence so the G/C content remains the same and so does the A/T content and therefore the length. I can generate random sequences but I cannot rearrange a given sequence randomly.

Any help would be great thanks.

python dna perl r • 8.9k views
ADD COMMENT
0
Entering edit mode

homework ?.....

ADD REPLY
0
Entering edit mode

Sounds like . . .

ADD REPLY
0
Entering edit mode
ADD REPLY
8
Entering edit mode
14.7 years ago

shuffleseq from EMBOSS shuffles a set of sequences maintaining composition.

ADD COMMENT
1
Entering edit mode

One can even use web-based shuffleseq

ADD REPLY
1
Entering edit mode

I second the use of shuffleseq.

ADD REPLY
4
Entering edit mode
14.7 years ago
brentp 24k

here's a function in python that "mutates" the original sequence, maintaining gc, at content.

import random

def seq_shuffler(original_seq="ACCAACXTGGGGTTTCCGGGGCCCCC"):
    original_seq = list(original_seq)
    while True:
        random.shuffle(original_seq)
        yield "".join(original_seq)

random_seq_gen = seq_shuffler()
print random_seq_gen.next()
print random_seq_gen.next()
print random_seq_gen.next()
print random_seq_gen.next()

# or loop.
for k in random_seq_gen:
    print k
ADD COMMENT
0
Entering edit mode

Nice example. I haven't really done much with Python yet, but the various examples I've seen on this site have convinced me to take a look at it. For the things I don't do with R I tend to use Perl.

ADD REPLY
0
Entering edit mode

uShuffle can produce a Perl module, and a Python too. I generally add it to my bioperl/biopython stuff. And it is time and memory efficient.

ADD REPLY
4
Entering edit mode
14.7 years ago
Ian Simpson ▴ 960

OK I got a bit obsessed with doing this in R because I thought you could do it in one line, which you can !! (not including the input that is)

#input string of choice
a <- 'agcactacgactacgacagcata';

#shuffle it
paste(sample(unlist(strsplit(a,split=''))),collapse='');

and to do this 100 times and print out the answer:-

for(i in 1:100){
    print(paste(sample(unlist(strsplit(a,split=''))),collapse=''),q=F);
}
ADD COMMENT
0
Entering edit mode

not bad! though that's a large value of 1. ;) the python version could also be "1" line.

ADD REPLY
0
Entering edit mode

five concatenated functions that's what R lives on !!! ;)

ADD REPLY
4
Entering edit mode
14.7 years ago
Rob Syme ▴ 540

While the EMBOSS solution is probably the best, if it needs to be incorporated into a script, the Bioruby library gives you the very convenient 'randomize' method:

require 'bio'
s = Bio::Sequence::NA.new("ACCAACXTGGGGTTTCCGGGGCCCCC")
s.randomize         # ==> "tagccggcctxgatcactgcgcgccg"
ADD COMMENT
3
Entering edit mode
14.7 years ago

What language are you using? Here's something in Ruby:

class Array
  #fisher-yates/knuth shuffle
  def shuffle!
    n = length
    for i in 0...n
      r = Kernel.rand(n-i)+i
      self[r], self[i] = self[i], self[r]
    end
    self
  end

  # Return a shuffled copy of the array
  def shuffle
    dup.shuffle!
  end
end

string = "AAATTTGGGCCC"
string.split(//).shuffle.join("")

> "AACGCTTTCAGG"

As always, there may be a more concise way to do this, but this will get the job done.

ADD COMMENT
0
Entering edit mode

I think the EMBOSS shuffleseq is the better solution, but if you really want to use ruby, the bioruby library gives you a convenient 'randomize' method:

require 'bio'
s = Bio::Sequence::NA.new("ACCAACXTGGGGTTTCCGGGGCCCCC")
s.randomize    #=> "tagccggcctxgatcactgcgcgccg"
ADD REPLY
2
Entering edit mode
14.7 years ago

Hi Jess,

You can use Sean Eddy's Squid lib from Sean Eddy to do this. It's will generate a set of command-line application able to shuffle you sequence in several ways. Additionally, you can use uShuffle which will do a similar job. Both can shuffle preserving the base counts and preserving n-base (dibase, tribase, etc.) counts too.

ADD COMMENT
2
Entering edit mode
14.2 years ago

also in perl without modules

my $seq = "AAAAAGTATACAACATCA"; #input seq
my @seqarray = split(//,$seq); #put seq in array
my @randarray = sort {rand() <=> rand()} @seqarray; #suffle indexes
my $outseq = join("",@randarray); #join shuffled sequence
print "$outseq\n"; #output

note that the perl sort function compares 2 numbers by <=> and returns -1, 0 or 1 depending which one is larger. if you sort by rand()<=>rand() then the sorting is random.

ADD COMMENT
1
Entering edit mode
14.7 years ago
Ian Simpson ▴ 960

If you fancy doing it in Perl there are three different ways you can try listed here

ADD COMMENT
1
Entering edit mode
14.2 years ago

There are a whole set of web-based tools available for this at http://www.bioinformatics.org/sms2/ This would be fine for the one-off or small set of sequences or for one who does not run perl or have access to tools found at a large institution. Nonetheless, the code examples above are also a way to learn...

ADD COMMENT

Login before adding your answer.

Traffic: 2512 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6