Generate A Large Set Of Sequences Permutation
4
2
Entering edit mode
14.0 years ago
Yang ▴ 190

I need to do permutation to 10,000 DNA sequences, the length of which are between 200bp~1000bp. In fact, for each sequence, I only need 5,000 permutations to do Monte-Carlo simulation. However, when perl modules List::Permutor or Algorithm::Permute was applied, I found 5,000 permutations of one sequence were quite similar, which went against my intention of random sampling. And to generate a universal set of permutation then randomly sampling is too time consuming. So my question is: is there a way to do a random permutation?

In fact, all I need to do is to search the enriched motif (5mer-10mer) on the sequence sets. So if there is any program for this, plz tell me :)

Thank you!

• 7.4k views
ADD COMMENT
0
Entering edit mode

what do you mean by permutation? Do you have to generate random sequences? random sequences from the same set of bases? do you need to change the order of a set of sequences you have? The question would be clearer with an example.

ADD REPLY
0
Entering edit mode

Yes, permutation means changing the order of original sequence. Say, if I have a sequence: CAGCTAGCATCGATCGTA

I want to change the order randomly 5000 times. However, when I applied the permutation modules, it generated:

CAGCTAGCATCGATCGAT CAGCTAGCATCGATCAGT CAGCTAGCATCGATACGT ...

So for a long sequence (e.g. 500bp), 5000 times of permutations are quite similar, which cannot be a Monte-Carlo simulation; If I take all the possibilities, it needs horrible time.

ADD REPLY
9
Entering edit mode
14.0 years ago
Neilfws 49k

How about the EMBOSS program, shuffleseq?

Description: This takes a sequence as input and outputs one or more sequences whose order has been randomly shuffled. No bases or residues are changed, only their order.

You can run it something like:

shuffleseq -sequence myseqs.fa -outseq myshuffledseqs.fa -shuffle 5000
ADD COMMENT
0
Entering edit mode

That's it! Thanks!!

ADD REPLY
0
Entering edit mode

You're welcome. An "accept answer" is always appreciated :-)

ADD REPLY
4
Entering edit mode
14.0 years ago

The tools that you tried are designed to give you all permutations without repeating the same sequence, so they will go methodically one by one, if all you want are a few random shuffles you can easily generate them directly in most programming languages, a Python solution would look like so:

import random

seq = list("ATAAAAGGATCCCCAC")

for step in range(10):
    random.shuffle(seq)
    print ''.join(seq)

---------- Python ----------

CTAAACCGCATAGACA
CCAACTAAGATGACCA
GACCACTCGTCAAAAA
ACAAAGCTATAGACCC
CAAGGATCCACCATAA
ACATCAAGACAATCGC
AGCTCGCCAACTAAAA
CAATGCCAACACATAG
TCAAAGAACCGATCCA
ATAGCCCACAAAACTG

Theoretically this could generate duplicate sequences but if your sequences are long the chance for that to happen is very low.

ADD COMMENT
0
Entering edit mode

Thanks!! Do you know the corresponding function of random.shuffle in perl?

ADD REPLY
1
Entering edit mode
14.0 years ago
Dave Lunt ★ 2.0k

Its perhaps not exactly what you want, I'm not very clear, but SeqGen might be useful. It is particularly good if you want not entirely random changes but rather changes constrained within a biologically realistic model of mutation.

"Seq-Gen is a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, using common models of the substitution process. A range of models of molecular evolution are implemented including the general reversible model. State frequencies and other parameters of the model may be given and site-specific rate heterogeneity may also be incorporated in a number of ways. Any number of trees may be read in and the program will produce any number of data sets for each tree. Thus large sets of replicate simulations can be easily created. It has been designed to be a general purpose simulator that incorporates most of the commonly used (and computationally tractable) models of molecular sequence evolution."

ADD COMMENT
1
Entering edit mode
14.0 years ago
Yang ▴ 190

Thanks to all the answers! I've just found a way for perl coders: the Algorithm::Numerical::Shuffle module.

e.g.

use Algorithm::Numerical::Shuffle qw /shuffle/;
my $seq = "CGATCGATCGATCGATCGTAGCTAGCTAGCT";   
my $shuf = shuffle [split(//, $seq)];

for my $i (1..5000) {    
    my @shuffled_seq = shuffle $shuf;
}
ADD COMMENT

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6