For short(ish) amino acid sequences, you could write a brief R script to do this for you. For an amino acid sequence "VARY"
library(Deducer) # this includes a whole bunch of other libraries
#get input
input <- "VARY"
#split to individual characters and get the first vector of the resulting list
sp.input <- strsplit( input, split='')[[1]]
#use deducer package to make all permutations
perms <- perm(sp.input)
#print results
print(perms)
The permutation matrix will rapidly get quite large as the number of input characters increases. The generate all alternatives is impractical. Alternatively, you repeatedly call sample() how ever many times you need.
sample(sp.input,length(sp.input), replace=FALSE)
#repeated calls for the iterative mind
num.samples <- 10
for(i in 1:num.samples)
{
#get a random sample of equal length to input
random.sample <- sample(sp.input,size=length(sp.input), replace=FALSE)
#paste the letters back together, collapse to single string, and print
random.sample <- paste(random.sample,sep='',collapse='')
print(random.sample)
}
Another solution could be to sort your peptide sequence, do a run length encoding and divide each by the total number of residues. This would give you the probabilities of the individual amino acids. You could use this vector of probabilities in the sample() function. In this way, you would ensure that your input string never gets beyond a length of 22. The replace parameter would have to be TRUE and the size parameter would have to be set independently.
How about enzymes families ?! is there any method to create random sequence from a bunch of aligned sequence ?! I will update the question now. Sorry about that.
oh, that's a completely different problem. What is your goal? to provide a control in multiple alignment algorithms?
not really, making a simulation data for proving the concept of a method. This method, is a predictive model for function prediction of protein sequences. Should I open a new question ?!