BioJava, installing forester correctly.
1
0
Entering edit mode
10.1 years ago
Bioaln ▴ 360

Hello everyone. I've recently started with BioJava and Maven and I decided to try out sequence alignment options. On the official bioJava page it says that additional library is needed: forester.jar. I've downloaded it from maven online repository yet my program still doesn't work, it is constantly giving me the arrayindexoutofbounds error, although I copied the entire code from the official BioJava site. I am not really sure how to install forester.jar as additional library tho.

package testproj;

import java.net.URL;

import org.biojava3.alignment.Alignments;
import org.biojava3.alignment.SimpleGapPenalty;
import org.biojava3.alignment.SimpleSubstitutionMatrix;

import org.biojava3.core.sequence.ProteinSequence;
import org.biojava3.core.sequence.compound.AminoAcidCompound;
import org.biojava3.core.sequence.io.FastaReaderHelper;


import org.biojava3.alignment.Alignments.PairwiseSequenceAlignerType;
import org.biojava3.alignment.template.SequencePair;
import org.biojava3.alignment.template.SubstitutionMatrix;


public class CookbookMSA {

    public static void main(String[] args) {
        String[] ids = new String[] {"Q21691", "Q21495","Q21693"};
        try {
            alignPairGlobal(ids[0], ids[1]);
        } catch (Exception e){
            e.printStackTrace();
        }
    }

    private static void alignPairGlobal(String id1, String id2) throws Exception {
        ProteinSequence s1 = getSequenceForId(id1), s2 = getSequenceForId(id2);
        SubstitutionMatrix<AminoAcidCompound> matrix = new SimpleSubstitutionMatrix<AminoAcidCompound>();
        SequencePair<ProteinSequence, AminoAcidCompound> pair = Alignments.getPairwiseAlignment(s1, s2,
                PairwiseSequenceAlignerType.GLOBAL, new SimpleGapPenalty(), matrix);
        System.out.printf("%n%s vs %s%n%s", pair.getQuery().getAccession(), pair.getTarget().getAccession(), pair);
    }

    private static ProteinSequence getSequenceForId(String uniProtId) throws Exception {
        URL uniprotFasta = new URL(String.format("http://www.uniprot.org/uniprot/%s.fasta", uniProtId));
        ProteinSequence seq = FastaReaderHelper.readFastaProteinSequence(uniprotFasta.openStream()).get(uniProtId);
        System.out.printf("id : %s %s%n%s%n", uniProtId, seq, seq.getOriginalHeader());
        return seq;
    }

}

Output:

id : Q21691 MDLLDKVMGEMGSKPGSTAKKPATSASSTPRTNVWGTAKKPSSQQQPPKPLFTTPGSQQGSLGGRIPKREHTDRTGPDPKRKPLGGLSVPDSFNNFGTFRVQMNAWNLDISKMDERISRIMFRATLVHTDGRRFELSLGVSAFSGDVNRQQRRQAQCLLFRAWFKRNPELFKGMTDPAIAAYDAAETIYVGCSFFDVELTEHVCHLTEADFSPQEWKIVSLISRRSGSTFEIRIKTNPPIYTRGPNALTLENRSELTRIIEAITDQCLHNEKFLLYSSGTFPTKGGDIASPDEVTLIKSGFVKTTKIVDRDGVPDAIMTVDTTKSPFYKDTSLLKFFTAKMDQLTNSGGGPRGHNGGRERRDGGGNSRKYDDRRSPRDGEIDYDERTVSHYQRQFQDERISDGMLNTLKQSLKGLDCQPIHLKDSKANRSIMIDEIHTGTADSVTFEQKLPDGEMKLTSITEYYLQRYNYRLKFPHLPLVTSKRAKCYDFYPMELMSILPGQRIKQSHMTVDIQSYMTGKMSSLPDQHIKQSKLVLTEYLKLGDQPANRQMDAFRVSLKSIQPIVTNAHWLSPPDMKFANNQLYSLNPTRGVRFQTNGKFVMPARVKSVTIINYDKEFNRNVDMFAEGLAKHCSEQGMKFDSRPNSWKKVNLGSSDRRGTKVEIEEAIRNGVTIVFGIIAEKRPDMHDILKYFEEKLGQQTIQISSETADKFMRDHGGKQTIDNVIRKLNPKCGGTNFLIDVPESVGHRVVCNNSAEMRAKLYAKTQFIGFEMSHTGARTRFDIQKVMFDGDPTVVGVAYSLKHSAQLGGFSYFQESRLHKLTNLQEKMQICLNAYEQSSSYLPETVVVYRVGSGEGDYPQIVNEVNEMKLAARKKKHGYNPKFLVICTQRNSHIRVFPEHINERGKSMEQNVKSGTCVDVPGASHGYEEFILCCQTPLIGTVKPTKYTIIVNDCRWSKNEIMNVTYHLAFAHQVSYAPPAIPNVSYAAQNLAKRGHNNYKTHTKLVDMNDYSYRIKEKHEEIISSEEVDDILMRDFIETVSNDLNAMTINGRNFWA
sp|Q21691|NRDE3_CAEEL Nuclear RNAi defective-3 protein OS=Caenorhabditis elegans GN=nrde-3 PE=1 SV=1
java.lang.ArrayIndexOutOfBoundsException: 0
    at org.biojava3.core.sequence.io.GenericFastaHeaderParser.parseHeader(GenericFastaHeaderParser.java:113)
    at org.biojava3.core.sequence.io.GenericFastaHeaderParser.parseHeader(GenericFastaHeaderParser.java:60)
    at org.biojava3.core.sequence.io.FastaReader.process(FastaReader.java:182)
    at org.biojava3.core.sequence.io.FastaReader.process(FastaReader.java:108)
    at org.biojava3.core.sequence.io.FastaReaderHelper.readFastaProteinSequence(FastaReaderHelper.java:100)
    at testproj.CookbookMSA.getSequenceForId(CookbookMSA.java:42)
    at testproj.CookbookMSA.alignPairGlobal(CookbookMSA.java:33)
    at testproj.CookbookMSA.main(CookbookMSA.java:26)

How do I fix this> sorry if it's a stupid question. Thanks for any help.

Biojava forester • 2.8k views
ADD COMMENT
1
Entering edit mode
10.1 years ago

Not a java error.

Your second item is Q21495. Look at this: http://www.uniprot.org/uniprot/Q21495

ADD COMMENT
0
Entering edit mode

You are right, when I use only: ids = new String[] {"Q21691","Q21693"}; Both sequences are read but there is whole new error list:

Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.biojava3.alignment.SimpleAlignedSequence.setLocation(SimpleAlignedSequence.java:358)
    at org.biojava3.alignment.SimpleAlignedSequence.<init>(SimpleAlignedSequence.java:88)
    at org.biojava3.alignment.SimpleProfile.<init>(SimpleProfile.java:119)
    at org.biojava3.alignment.SimpleSequencePair.<init>(SimpleSequencePair.java:86)
    at org.biojava3.alignment.SimpleSequencePair.<init>(SimpleSequencePair.java:69)
    at org.biojava3.alignment.routines.AnchoredPairwiseSequenceAligner.setProfile(AnchoredPairwiseSequenceAligner.java:137)
    at org.biojava3.alignment.template.AbstractMatrixAligner.align(AbstractMatrixAligner.java:344)
    at org.biojava3.alignment.template.AbstractPairwiseSequenceAligner.getPair(AbstractPairwiseSequenceAligner.java:112)
    at org.biojava3.alignment.Alignments.getPairwiseAlignment(Alignments.java:208)
    at testproj.CookbookMSA.alignPairGlobal(CookbookMSA.java:36)
    at testproj.CookbookMSA.main(CookbookMSA.java:27)
Caused by: java.lang.NullPointerException
    at java.util.Collections$UnmodifiableCollection.<init>(Collections.java:1026)
    at java.util.Collections$UnmodifiableList.<init>(Collections.java:1302)
    at java.util.Collections.unmodifiableList(Collections.java:1287)
    at org.biojava3.core.sequence.location.template.AbstractLocation.<init>(AbstractLocation.java:111)
    at org.biojava3.core.sequence.location.template.AbstractLocation.<init>(AbstractLocation.java:85)
    at org.biojava3.core.sequence.location.SimpleLocation.<init>(SimpleLocation.java:57)
    at org.biojava3.core.sequence.location.SimpleLocation.<init>(SimpleLocation.java:53)
    at org.biojava3.core.sequence.location.template.Location.<clinit>(Location.java:48)
    ... 11 more

I AM EVEN MORE LOST NOW :(

ADD REPLY
0
Entering edit mode

One of your variable is null somewhere. Don't use objects before checking they're not null, close your open streams, use System.err, etc...

private static ProteinSequence getSequenceForId(String uniProtId) throws Exception {
    URL uniprotFasta = new URL( "http://www.uniprot.org/uniprot/" + URLEncoder.encode(uniProtId,"UTF-8") + ".fasta");
    InputStream in =  uniprotFasta.openStream();
    LinkedHashMap<String,ProteinSequence> seqs = FastaReaderHelper.readFastaProteinSequence(in);
    in.close();       
    ProteinSequence seq = seqs.get(uniProtId);
    if(seq==null) throw new RuntimeException("not found"+ uniProtId);
    System.err.printf("id : %s %s%n%s%n", uniProtId, seq, seq.getOriginalHeader());
    return seq;
}
ADD REPLY
0
Entering edit mode

Alright, I tried this and the error stays the same. One last thing if you would. Is there any possibility you could provide

The simplest working example of pairwise protein seq alignment, you seem like you've done this a thousand times. It would really help me with debugging. Thank you very much.

ADD REPLY

Login before adding your answer.

Traffic: 1727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6