I have been using biojava and was able to load fasta files.
Does it make sense or do you think bioljava or something similar does a comparison between two sequences and just gives a percent number of similarities? Does that even make sense?
Here is the code I have been using for alignment, copy-pasted from the BioJava3 Cookbook:
package org.biojava3.alignment;
import java.net.URL;
import org.biojava3.alignment.Alignments.PairwiseSequenceAlignerType;
import org.biojava3.alignment.template.SequencePair;
import org.biojava3.alignment.template.SubstitutionMatrix;
import org.biojava3.core.sequence.ProteinSequence;
import org.biojava3.core.sequence.compound.AminoAcidCompound;
import org.biojava3.core.sequence.io.FastaReaderHelper;
public static void main(String[] args) {
String[] ids = new String[] {"Q21691", "Q21495", "O48771"};
try {
alignPairLocal(ids[0], ids[1]);
} catch (Exception e){
e.printStackTrace();
}
}
private static void alignPairLocal(String id1, String id2) throws Exception {
ProteinSequence s1 = getSequenceForId(id1), s2 = getSequenceForId(id2);
SubstitutionMatrix<AminoAcidCompound> matrix = new SimpleSubstitutionMatrix<AminoAcidCompound>();
SequencePair<ProteinSequence, AminoAcidCompound=""> pair = Alignments.getPairwiseAlignment(s1, s2,
PairwiseSequenceAlignerType.LOCAL, new SimpleGapPenalty(), matrix);
System.out.printf("%n%s vs %s%n%s", pair.getQuery().getAccession(), pair.getTarget().getAccession(), pair);
}
private static ProteinSequence getSequenceForId(String uniProtId) throws Exception {
URL uniprotFasta = new URL(String.format("http://www.uniprot.org/uniprot/%s.fasta", uniProtId));
ProteinSequence seq = FastaReaderHelper.readFastaProteinSequence(uniprotFasta.openStream()).get(uniProtId);
System.out.printf("id : %s %s%n%s%n", uniProtId, seq, seq.getOriginalHeader());
return seq;
}
Michael -- yes, BioJava 3 is where active development is happening now. The only downside is module coverage since it doesn't contain everything 1.8 did, but it already has quite a bit of functionality.
I think that makes perfectly sense to get the percent sequence identity. I don't have bioperl here, but isn't there a method like ' getPercentIdenity'?
Btw, biojava doesn't do the alignments itself you have to use an external program.
Michael, BioJava 3 does have support for global/local pairwise and multiple alignment: http://biojava.org/wiki/BioJava:CookBook3:PSA with examples in the Cookbook: http://biojava.org/wiki/BioJava:CookBook3.0
berlinbrowndev, you are looking for 'getNumIdenticals' or 'getNumSimilars' in the returned SequencePair http://www.biojava.org/docs/api/org/biojava3/alignment/template/SequencePair.html
ok, I was looking at biojava 1, Brad do you know if biojava3 is already 'in good shape' to be used?
Hey Berlin, just saw that you had copy pasted the code from the example in the cookbook, had you posted your sources, and pasted the complete code (missed the import part), you could have saved us some time here. Please remember next time, sorry no +1 here...
+1 anyway for bringing up the biojava3 topic