Question

Rearranging Dna Sequences

1

Entering edit mode

14.7 years ago

Jess ▴ 10

I'm having trouble rearranging a DNA sequence. I need to rearrange randomly a given DNA sequence so the G/C content remains the same and so does the A/T content and therefore the length. I can generate random sequences but I cannot rearrange a given sequence randomly.

Any help would be great thanks.

python dna perl r • 9.0k views

ADD COMMENT • link updated 14.7 years ago by Larry_Parnell 16k • written 14.7 years ago by Jess ▴ 10

0

Entering edit mode

homework ?.....

ADD REPLY • link 14.7 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Sounds like . . .

ADD REPLY • link 14.7 years ago by Jarretinha 3.4k

0

Entering edit mode

Duplicate of How To Scramble A Sequence Using An Existing Script Or A Python Method?

ADD REPLY • link updated 5.3 years ago by Ram 44k • written 13.6 years ago by Martin A Hansen 3.0k

Ram · Answer 1 · 2010-03-30

8

Entering edit mode

14.7 years ago

Marcos De Carvalho ▴ 310

shuffleseq from EMBOSS shuffles a set of sequences maintaining composition.

ADD COMMENT • link updated 5.6 years ago by Ram 44k • written 14.7 years ago by Marcos De Carvalho ▴ 310

1

Entering edit mode

One can even use web-based shuffleseq

ADD REPLY • link updated 5.6 years ago by Ram 44k • written 14.7 years ago by Darked89 4.7k

1

Entering edit mode

I second the use of shuffleseq.

ADD REPLY • link 14.7 years ago by Neilfws 49k

Ram · Answer 2 · 2010-03-29

4

Entering edit mode

14.7 years ago

brentp 24k

here's a function in python that "mutates" the original sequence, maintaining gc, at content.

import random

def seq_shuffler(original_seq="ACCAACXTGGGGTTTCCGGGGCCCCC"):
    original_seq = list(original_seq)
    while True:
        random.shuffle(original_seq)
        yield "".join(original_seq)

random_seq_gen = seq_shuffler()
print random_seq_gen.next()
print random_seq_gen.next()
print random_seq_gen.next()
print random_seq_gen.next()

# or loop.
for k in random_seq_gen:
    print k

ADD COMMENT • link updated 6.3 years ago by Ram 44k • written 14.7 years ago by brentp 24k

0

Entering edit mode

Nice example. I haven't really done much with Python yet, but the various examples I've seen on this site have convinced me to take a look at it. For the things I don't do with R I tend to use Perl.

ADD REPLY • link 14.7 years ago by Ian Simpson ▴ 960

0

Entering edit mode

uShuffle can produce a Perl module, and a Python too. I generally add it to my bioperl/biopython stuff. And it is time and memory efficient.

ADD REPLY • link 14.7 years ago by Jarretinha 3.4k

Ram · Answer 3 · 2010-03-30

4

Entering edit mode

14.7 years ago

Ian Simpson ▴ 960

OK I got a bit obsessed with doing this in R because I thought you could do it in one line, which you can !! (not including the input that is)

#input string of choice
a <- 'agcactacgactacgacagcata';

#shuffle it
paste(sample(unlist(strsplit(a,split=''))),collapse='');

and to do this 100 times and print out the answer:-

for(i in 1:100){
    print(paste(sample(unlist(strsplit(a,split=''))),collapse=''),q=F);
}

ADD COMMENT • link updated 6.3 years ago by Ram 44k • written 14.7 years ago by Ian Simpson ▴ 960

0

Entering edit mode

not bad! though that's a large value of 1. ;) the python version could also be "1" line.

ADD REPLY • link 14.7 years ago by brentp 24k

0

Entering edit mode

five concatenated functions that's what R lives on !!! ;)

ADD REPLY • link 14.7 years ago by Ian Simpson ▴ 960

Ram · Answer 4 · 2010-03-31

4

Entering edit mode

14.7 years ago

Rob Syme ▴ 540

While the EMBOSS solution is probably the best, if it needs to be incorporated into a script, the Bioruby library gives you the very convenient 'randomize' method:

require 'bio'
s = Bio::Sequence::NA.new("ACCAACXTGGGGTTTCCGGGGCCCCC")
s.randomize         # ==> "tagccggcctxgatcactgcgcgccg"

ADD COMMENT • link updated 6.3 years ago by Ram 44k • written 14.7 years ago by Rob Syme ▴ 540

Ram · Answer 5 · 2010-03-29

3

Entering edit mode

14.7 years ago

Chris Miller 22k

What language are you using? Here's something in Ruby:

class Array
  #fisher-yates/knuth shuffle
  def shuffle!
    n = length
    for i in 0...n
      r = Kernel.rand(n-i)+i
      self[r], self[i] = self[i], self[r]
    end
    self
  end

  # Return a shuffled copy of the array
  def shuffle
    dup.shuffle!
  end
end

string = "AAATTTGGGCCC"
string.split(//).shuffle.join("")

> "AACGCTTTCAGG"

As always, there may be a more concise way to do this, but this will get the job done.

ADD COMMENT • link updated 6.3 years ago by Ram 44k • written 14.7 years ago by Chris Miller 22k

0

Entering edit mode

I think the EMBOSS shuffleseq is the better solution, but if you really want to use ruby, the bioruby library gives you a convenient 'randomize' method:

require 'bio'
s = Bio::Sequence::NA.new("ACCAACXTGGGGTTTCCGGGGCCCCC")
s.randomize    #=> "tagccggcctxgatcactgcgcgccg"

ADD REPLY • link updated 6.3 years ago by Ram 44k • written 14.7 years ago by Rob Syme ▴ 540

score 2 · Answer 6 · 2010-03-29

Hi Jess,

You can use Sean Eddy's Squid lib from Sean Eddy to do this. It's will generate a set of command-line application able to shuffle you sequence in several ways. Additionally, you can use uShuffle which will do a similar job. Both can shuffle preserving the base counts and preserving n-base (dibase, tribase, etc.) counts too.

Ram · Answer 7 · 2010-09-29

also in perl without modules

my $seq = "AAAAAGTATACAACATCA"; #input seq
my @seqarray = split(//,$seq); #put seq in array
my @randarray = sort {rand() <=> rand()} @seqarray; #suffle indexes
my $outseq = join("",@randarray); #join shuffled sequence
print "$outseq\n"; #output

note that the perl sort function compares 2 numbers by <=> and returns -1, 0 or 1 depending which one is larger. if you sort by rand()<=>rand() then the sorting is random.

Ram · Answer 8 · 2010-03-29

1

Entering edit mode

14.7 years ago

Ian Simpson ▴ 960

If you fancy doing it in Perl there are three different ways you can try listed here

ADD COMMENT • link updated 5.6 years ago by Ram 44k • written 14.7 years ago by Ian Simpson ▴ 960

score 1 · Answer 9 · 2010-09-29

There are a whole set of web-based tools available for this at http://www.bioinformatics.org/sms2/ This would be fine for the one-off or small set of sequences or for one who does not run perl or have access to tools found at a large institution. Nonetheless, the code examples above are also a way to learn...