Reading .sam files in java
2
0
Entering edit mode
8.3 years ago
torkel.loman ▴ 10

Hello,

I'm currently writing an algorithm to which the input is a large set of reads in a .sam file. I have earlier written it in matlab, but am now translating it to a java program. I have however problem to read the .sam file in java.

(I'm using Eclipse 4.6.0, Java 1.8 and the windows 10 operating system)

I've been looking around and many people seems to recommend something called "Picard tools". Many (older) links people give go to http://picard.sourceforge.net/javadoc/net/sf/samtools/SAMFileReader which seems no longer to be in use.

When I search I find it under https://broadinstitute.github.io/picard/ but the instructions there seems to be on how to use the program from the command prompt. I need a package (or something similar) which I can import in my java program and then use.

Can you still use Picard like that in some way (some old example with the non-working link seemed to have this approach). Or are the some other program/package I could use for this?

Happy if someone could help!

sequencing next-gen java sam Picard • 5.2k views
ADD COMMENT
5
Entering edit mode
8.3 years ago

A simple example:

import java.io.*;
import htsjdk.samtools.*;

public class Biostar207657
    {
    public static void main(String args[]) throws Exception
        {
        File bamFile = new File(args[0]);
        SamReader sr = SamReaderFactory.makeDefault().validationStringency(ValidationStringency.SILENT).open(bamFile);
        SAMRecordIterator r = sr.iterator();
        while(r.hasNext()) {
            System.out.println(r.next().getReadName());
        }
        r.close();
        sr.close();
        }

    }

compile:

$ javac -cp /home/lindenb/src/htsjdk/dist/xz-1.5.jar:/home/lindenb/src/htsjdk/dist/commons-compress-1.4.1.jar:/home/lindenb/src/htsjdk/dist/apache-ant-1.8.2-bzip2.jar:/home/lindenb/src/htsjdk/dist/ngs-java-1.2.2.jar:/home/lindenb/src/htsjdk/dist/snappy-java-1.0.3-rc3.jar:/home/lindenb/src/htsjdk/dist/commons-logging-1.1.1.jar:/home/lindenb/src/htsjdk/dist/htsjdk-2.1.0.jar:/home/lindenb/src/htsjdk/dist/commons-jexl-2.1.1.jar Biostar207657.java 

execute:

$ java -cp /home/lindenb/src/htsjdk/dist/xz-1.5.jar:/home/lindenb/src/htsjdk/dist/commons-compress-1.4.1.jar:/home/lindenb/src/htsjdk/dist/apache-ant-1.8.2-bzip2.jar:/home/lindenb/src/htsjdk/dist/ngs-java-1.2.2.jar:/home/lindenb/src/htsjdk/dist/snappy-java-1.0.3-rc3.jar:/home/lindenb/src/htsjdk/dist/commons-logging-1.1.1.jar:/home/lindenb/src/htsjdk/dist/htsjdk-2.1.0.jar:/home/lindenb/src/htsjdk/dist/commons-jexl-2.1.1.jar:. Biostar207657 in.bam
ADD COMMENT
0
Entering edit mode

Thank you, it works now.

ADD REPLY
1
0
Entering edit mode

Thank you!

Two questions: I your first and last link there's reference to an object called "SAMFileReader". But in the API documentation I don't find it. However there seems to be something called "ViewSam" that allows for reading the sam file. Is it supposed to be like this? Or have SAMFileReader been removed? (Both of your links referencing it contains outdated links).

Also: How do I get access to the packages. If I write

final SAMFileReader inputSam = new SAMFileReader(inputSamOrBamFile)

As in the first link Eclipse tells me that SAMFileReader can't be resolved to a type (and I guess it is natural that it is not included from the start). I should need to write something like:

import java.util.* (But for Picard)

to make the functions viable, right?

Also, I don't really get the link with the code, there's a lot of folders with different names, and even more folders inside, but I don't really understand what it is all for.

(Thanks again for the help)

ADD REPLY
0
Entering edit mode

So I have managed to download picard 1.119 in a Zip file. It includes several different .jar files which I've managed to import into eclipse.

But I've been unable to find any useful documentation. In the provided link I can go to the specific classes and read about them. But for the ViewSam class it only says that it got a constructor (which I've managed to use) and three functions:

customCommandLineValidation() doWork() main(java.lang.String[] args)

The descriptions are short and hardly useful. E.g. for "doWork" it tells me "Do the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately."

which doesn't really tell me anything. It doesn't tell me how I get a specific read, nor how I get its sequence or Cigar String.

In matlab it is very simple, I just write sam = samread("name.sam") %Can also specify a specific interval of the sam file to fetch. CigarString = sam(1).CigarString %Stores the Cigar String of the first entry as a string in the variable CigarString.

The picard documentation doesn't mention how to do these quite basic commands.

(Other classes have the same level of documentation. Most also seems to share the same functions. Like doWork().

Another thing: I'm still confused by the SAMFileWriter() class. In all the examples this one seems to be used. However when I look at picard dcdocumentationt's nonowhereo be found. Neither can I find it in the downloaded package. Is the SAMFileWriter() actually a part of the same Picard tools?

ADD REPLY
0
Entering edit mode

If you want to read SAM you can use SAMFileReader as declared in example and loop through the reads and get the properties from the SamRecord interface. why you are using ViewSam?

ADD REPLY
0
Entering edit mode

Thank you, sounds perfect!

Reason I tried to use ViewSam was that it was the only thing I managed to import into eclipse (by downloading the picard .zip file with a lot of jar files and then got to configure build path and "Add external jar").

I tried writing:

package net.sf.samtools.example;
import net.sf.samtools.*;
import java.io.File;

As in your example but it would not work. I pressure I would have to do something more (like the jar importing) but I could not find SAMFileReader anywhere in the picard package and no further instruction on what to do.

ADD REPLY
0
Entering edit mode

svn checkout svn://svn.code.sf.net/p/picard/code/trunk picard-code

ADD REPLY
0
Entering edit mode

I'm sorry but I don't understand. I can't reach the webpage. Is it some kind of command? I tried it in my Windows 10 commando prompt but to no result.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Ok, I think it is almost working now. I got something called TortoiseSNV and added it to the folder in my eclipse library. I can now write

import net.sf.samtools.*; import java.io.File; import htsjdk.samtools.SAMFileReader;

Without getting error messages. However I still don't know how to open the samfile. I the example final inputSamOrBamFile and outputSamOrBamFile are input to a function and I don't know how to create them. They seem to be "File" objects. I have a file called "reads.sam" which is the sam file I want to read. I try to creat a file object from it using:

File samFile = new File("reads.sam");

But it doesn't work (the file reads.sam is in the folder as the .java program). I've also tried by copying the file path: File samFile = new File("/Haplotyping/src/net/sf/samtools/example/reads.sam"); and: File samFile = new File("C:/Users/admin/workspace/Haplotyping/src/net/sf/samtools/example/reads.sam");

but it doesn't work either.

There is another strange thing: When I write:

final SAMFileReader sam = new SAMFileReader(samFile);

Theres a error with red lines under SAMFileReader both of them. I can correct it (eclipse say) by adding:

import htsjdk.samtools.SAMFileReader;

but when I do so eclipse put a black line over the text SAMFileReader (both in the import messages and the occurances in the code). If I hover over it with my cursor it says "The type SAMFileReader is deprecated". What does that mean? Is it a problem?

ADD REPLY

Login before adding your answer.

Traffic: 1493 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6