For example. I have a DNA sequence and I want to generate a randomized sequence with same DNA composition? Which program can I use to do it?
For example. I have a DNA sequence and I want to generate a randomized sequence with same DNA composition? Which program can I use to do it?
You should also consider keeping the dinucleotide composition by applying the Altschul-Erikson algorithm ("Significance of nucleotide sequence alignments: A method for random sequence permutation that preserves dinucleotide and codon usage", S.F. Altschul and B.W. Erikson, Mol. Biol. Evol., 2(6):526--538, 1985).
P. Clote already implemented it in Python for the community:
http://bioinformatics.bc.edu/clotelab/RNAdinucleotideShuffle/dinucleotideShuffle.html
There are countless ways to do this using pretty much any programming language.
From the perlfaq4: How do I shuffle an array randomly:
For example, if you either have Perl 5.8.0 or later installed, or if you have Scalar-List-Utils 1.03 or later installed, you can do the following from the command line (as a perl one liner):
perl -e 'use List::Util 'shuffle';
@original_dna = split(//, "ATATTCATGAGTACCGTA");
@random_dna = shuffle(@original_dna);
print "\nOriginal: ", @original_dna, "\nRandom: ", @random_dna, "\n\n";'
Based on answers from stackoverflow: shuffle string in python:
python -c "import random;
dna=list('ATATTCATGAGTACCGTA');
print 'Original:',''.join(dna);
random.shuffle(dna);
print 'Random: ',''.join(dna);"
You can use shuffleseq from the EMBOSS suite. Here's the documentation. Here's a web interface.
R version:
dna = unlist(strsplit("acctg", split=""))
rand_dna = [sample(1:length(dna))]
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I encourage you to also consider broadening your search beyond this thread because a web search will turn up a lot of related pages. I'm certain you will find a variety of helpful approaches to this question on this site alone because I recall answering similar questions in the past.
By "DNA composition", do you just mean single-base frequency?