Hello,
Biojava is a framework bugged also in the most simple functions and it is very hard to think to use it for solving complex questions.
I've recently used it to make an alignment using HMM and I try to execute the example in the cookbook
Here is the example
ProfileHMM hmm = new ProfileHMM(DNATools.getDNA(),
12,
DistributionFactory.DEFAULT,
DistributionFactory.DEFAULT,
"my profilehmm");
//create the Dynamic Programming matrix for the model.
dp = DPFactory.DEFAULT.createDP(hmm);
//Database to hold the training set
SequenceDB db = new HashSequenceDB();
//code here to load the training set
Now initialize all of the model parameters to a uniform value. Alternatively parameters could be set randomly or set to represent a guess at what the best model might be. Then use the Baum-Welch Algorithm to optimise the parameters.
//train the model to have uniform parameters
ModelTrainer mt = new SimpleModelTrainer();
//register the model to train
mt.registerModel(hmm);
//as no other counts are being used the null weight will cause everything to be uniform
mt.setNullModelWeight(1.0);
mt.train();
//create a BW trainer for the dp matrix generated from the HMM
BaumWelchTrainer bwt = new BaumWelchTrainer(dp);
//anonymous implementation of the stopping criteria interface to stop after 20 iterations
StoppingCriteria stopper = new StoppingCriteria(){
public boolean isTrainingComplete(TrainingAlgorithm ta){
return (ta.getCycle() > 20);
}
};
/*
* optimize the dp matrix to reflect the training set in db using a null model
* weight of 1.0 and the Stopping criteria defined above.
*/
bwt.train(db,1.0,stopper);
Below is an example of scoring a sequence and outputting the state path.
SymbolList test = null;
//code here to initialize the test sequence
/*
* put the test sequence in an array, an array is used because for pairwise
* alignments using an HMM there would need to be two SymbolLists in the
* array
*/
SymbolList[] sla = {test};
//decode the most likely state path and produce an 'odds' score
StatePath path = dp.viterbi(sla, ScoreType.ODDS);
System.out.println("Log Odds = "+path.getScore());
//print state path
for(int I = 1; I <= path.length(); i++){
System.out.println(path.symbolAt(StatePath.STATES, i).getName());
}
Everything seemed to go right up to the following line when I got an Exception(and it is not the first time since I use biojava):
StatePath path = dp.viterbi(sla, ScoreType.ODDS);</pre>
java.lang.ClassCastException: org.biojava.bio.seq.impl.SimpleSequence cannot be cast to java.lang.String
at org.biojava.bio.alignment.SimpleAlignment.<init>(SimpleAlignment.java:214)
at org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:671)
at org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:512)
at it.multimedia.hmm.TestHMM.main(TestHMM.java:149)
I pass as parameter a file with fasta sequences
Could you help me to solve this problem?
If someone who wrote Biojava should read this post I'd like to ask him how can be published a so "bugged" framework of which not even can be executed correctly examples reported in the official documentation.
Thank you very much
You could consider opening this as an issue on their GitHub. Sorry I can't address the error itself.