Simulation Protein Sequences
4
0
Entering edit mode
12.5 years ago
User 1933 ▴ 360

Is there any software/script to generate random/artificial amino acid sequences, based on a given sequences ?

Also, is there any method to create random sequence from a bunch of aligned sequence ?! Imagine, one is interested in generating a random sequence within an enzyme family.

amino-acids sequence • 4.6k views
ADD COMMENT
1
Entering edit mode
12.5 years ago
JC 13k

SMS in Bioinformatics.org has "protein shuffle": http://www.bioinformatics.org/sms2/shuffle_protein.html

Emboss "shuffleseq" also can do the job: http://emboss.sourceforge.net/apps/release/6.3/emboss/apps/shuffleseq.html

Also, creating a simple script in perl/ruby/python is easy.

Edit: my original response was for a random (shuffle) protein sequence, the question has changed to a more elaborate problem.

ADD COMMENT
0
Entering edit mode

How about enzymes families ?! is there any method to create random sequence from a bunch of aligned sequence ?! I will update the question now. Sorry about that.

ADD REPLY
0
Entering edit mode

oh, that's a completely different problem. What is your goal? to provide a control in multiple alignment algorithms?

ADD REPLY
0
Entering edit mode

not really, making a simulation data for proving the concept of a method. This method, is a predictive model for function prediction of protein sequences. Should I open a new question ?!

ADD REPLY
1
Entering edit mode
12.5 years ago
grosscol ▴ 40

For short(ish) amino acid sequences, you could write a brief R script to do this for you. For an amino acid sequence "VARY"

library(Deducer) # this includes a whole bunch of other libraries

#get input
input <- "VARY"

#split to individual characters and get the first vector of the resulting list
sp.input <- strsplit( input, split='')[[1]]

#use deducer package to make all permutations
perms <- perm(sp.input)

#print results
print(perms)

The permutation matrix will rapidly get quite large as the number of input characters increases. The generate all alternatives is impractical. Alternatively, you repeatedly call sample() how ever many times you need.

sample(sp.input,length(sp.input), replace=FALSE)

#repeated calls for the iterative mind
num.samples <- 10
for(i in 1:num.samples)
{
  #get a random sample of equal length to input
  random.sample <- sample(sp.input,size=length(sp.input), replace=FALSE)

  #paste the letters back together, collapse to single string, and print
  random.sample <- paste(random.sample,sep='',collapse='')
  print(random.sample)
}

Another solution could be to sort your peptide sequence, do a run length encoding and divide each by the total number of residues. This would give you the probabilities of the individual amino acids. You could use this vector of probabilities in the sample() function. In this way, you would ensure that your input string never gets beyond a length of 22. The replace parameter would have to be TRUE and the size parameter would have to be set independently.

ADD COMMENT
0
Entering edit mode

thanks for your comprehensive response. is there any method to create random sequence from a bunch of aligned sequence ?! Say, I am interested in generating a random sequence within an enzyme family. Thanks.

ADD REPLY
1
Entering edit mode
12.5 years ago
Botond Sipos ★ 1.7k

To answer your second question: you can use hmmbuild from the HMMER package to build a profile HMM modelling your multiple alignment and the use hmmemit to generate sequences having the characteristics of your protein family.

ADD COMMENT

Login before adding your answer.

Traffic: 1920 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6