Hello everyone. I've recently started with BioJava and Maven and I decided to try out sequence alignment options. On the official bioJava page it says that additional library is needed: forester.jar. I've downloaded it from maven online repository yet my program still doesn't work, it is constantly giving me the arrayindexoutofbounds error, although I copied the entire code from the official BioJava site. I am not really sure how to install forester.jar as additional library tho.
package testproj;
import java.net.URL;
import org.biojava3.alignment.Alignments;
import org.biojava3.alignment.SimpleGapPenalty;
import org.biojava3.alignment.SimpleSubstitutionMatrix;
import org.biojava3.core.sequence.ProteinSequence;
import org.biojava3.core.sequence.compound.AminoAcidCompound;
import org.biojava3.core.sequence.io.FastaReaderHelper;
import org.biojava3.alignment.Alignments.PairwiseSequenceAlignerType;
import org.biojava3.alignment.template.SequencePair;
import org.biojava3.alignment.template.SubstitutionMatrix;
public class CookbookMSA {
public static void main(String[] args) {
String[] ids = new String[] {"Q21691", "Q21495","Q21693"};
try {
alignPairGlobal(ids[0], ids[1]);
} catch (Exception e){
e.printStackTrace();
}
}
private static void alignPairGlobal(String id1, String id2) throws Exception {
ProteinSequence s1 = getSequenceForId(id1), s2 = getSequenceForId(id2);
SubstitutionMatrix<AminoAcidCompound> matrix = new SimpleSubstitutionMatrix<AminoAcidCompound>();
SequencePair<ProteinSequence, AminoAcidCompound> pair = Alignments.getPairwiseAlignment(s1, s2,
PairwiseSequenceAlignerType.GLOBAL, new SimpleGapPenalty(), matrix);
System.out.printf("%n%s vs %s%n%s", pair.getQuery().getAccession(), pair.getTarget().getAccession(), pair);
}
private static ProteinSequence getSequenceForId(String uniProtId) throws Exception {
URL uniprotFasta = new URL(String.format("http://www.uniprot.org/uniprot/%s.fasta", uniProtId));
ProteinSequence seq = FastaReaderHelper.readFastaProteinSequence(uniprotFasta.openStream()).get(uniProtId);
System.out.printf("id : %s %s%n%s%n", uniProtId, seq, seq.getOriginalHeader());
return seq;
}
}
Output:
id : Q21691 MDLLDKVMGEMGSKPGSTAKKPATSASSTPRTNVWGTAKKPSSQQQPPKPLFTTPGSQQGSLGGRIPKREHTDRTGPDPKRKPLGGLSVPDSFNNFGTFRVQMNAWNLDISKMDERISRIMFRATLVHTDGRRFELSLGVSAFSGDVNRQQRRQAQCLLFRAWFKRNPELFKGMTDPAIAAYDAAETIYVGCSFFDVELTEHVCHLTEADFSPQEWKIVSLISRRSGSTFEIRIKTNPPIYTRGPNALTLENRSELTRIIEAITDQCLHNEKFLLYSSGTFPTKGGDIASPDEVTLIKSGFVKTTKIVDRDGVPDAIMTVDTTKSPFYKDTSLLKFFTAKMDQLTNSGGGPRGHNGGRERRDGGGNSRKYDDRRSPRDGEIDYDERTVSHYQRQFQDERISDGMLNTLKQSLKGLDCQPIHLKDSKANRSIMIDEIHTGTADSVTFEQKLPDGEMKLTSITEYYLQRYNYRLKFPHLPLVTSKRAKCYDFYPMELMSILPGQRIKQSHMTVDIQSYMTGKMSSLPDQHIKQSKLVLTEYLKLGDQPANRQMDAFRVSLKSIQPIVTNAHWLSPPDMKFANNQLYSLNPTRGVRFQTNGKFVMPARVKSVTIINYDKEFNRNVDMFAEGLAKHCSEQGMKFDSRPNSWKKVNLGSSDRRGTKVEIEEAIRNGVTIVFGIIAEKRPDMHDILKYFEEKLGQQTIQISSETADKFMRDHGGKQTIDNVIRKLNPKCGGTNFLIDVPESVGHRVVCNNSAEMRAKLYAKTQFIGFEMSHTGARTRFDIQKVMFDGDPTVVGVAYSLKHSAQLGGFSYFQESRLHKLTNLQEKMQICLNAYEQSSSYLPETVVVYRVGSGEGDYPQIVNEVNEMKLAARKKKHGYNPKFLVICTQRNSHIRVFPEHINERGKSMEQNVKSGTCVDVPGASHGYEEFILCCQTPLIGTVKPTKYTIIVNDCRWSKNEIMNVTYHLAFAHQVSYAPPAIPNVSYAAQNLAKRGHNNYKTHTKLVDMNDYSYRIKEKHEEIISSEEVDDILMRDFIETVSNDLNAMTINGRNFWA
sp|Q21691|NRDE3_CAEEL Nuclear RNAi defective-3 protein OS=Caenorhabditis elegans GN=nrde-3 PE=1 SV=1
java.lang.ArrayIndexOutOfBoundsException: 0
at org.biojava3.core.sequence.io.GenericFastaHeaderParser.parseHeader(GenericFastaHeaderParser.java:113)
at org.biojava3.core.sequence.io.GenericFastaHeaderParser.parseHeader(GenericFastaHeaderParser.java:60)
at org.biojava3.core.sequence.io.FastaReader.process(FastaReader.java:182)
at org.biojava3.core.sequence.io.FastaReader.process(FastaReader.java:108)
at org.biojava3.core.sequence.io.FastaReaderHelper.readFastaProteinSequence(FastaReaderHelper.java:100)
at testproj.CookbookMSA.getSequenceForId(CookbookMSA.java:42)
at testproj.CookbookMSA.alignPairGlobal(CookbookMSA.java:33)
at testproj.CookbookMSA.main(CookbookMSA.java:26)
How do I fix this> sorry if it's a stupid question. Thanks for any help.
You are right, when I use only:
ids = new String[] {"Q21691","Q21693"};
Both sequences are read but there is whole new error list:I AM EVEN MORE LOST NOW :(
One of your variable is null somewhere. Don't use objects before checking they're not null, close your open streams, use System.err, etc...
Alright, I tried this and the error stays the same. One last thing if you would. Is there any possibility you could provide
The simplest working example of pairwise protein seq alignment, you seem like you've done this a thousand times. It would really help me with debugging. Thank you very much.