How to quickly count the number of reads at a specific coordinate in a bam file
3
0
Entering edit mode
7.5 years ago
klin • 0

Hi all,

I am trying to count the number of reads (or alignments) for specific genomic locations in a bam file. I don't want reads with skipped region from the reference. I am using samtools and awk to do it now, but I hope to do it quicker.

samtools view input.bam chr10:18000-20000  | awk '($6 !~ /[N*]/)' | wc -l

Is there a way to make this process faster?

Instead of run samtools+awk for every location, should I just go through the bam files once? Would the alternative be faster?

Thanks, Woody

RNA-Seq sequencing SNP • 5.9k views
ADD COMMENT
1
Entering edit mode
7.5 years ago

bam-readcount does this: https://github.com/genome/bam-readcount

ADD COMMENT
0
Entering edit mode
7.5 years ago

usingt htsjdk (not tested):

import java.io.File;
import htsjdk.samtools.CigarOperator;
import htsjdk.samtools.SAMRecord;
import htsjdk.samtools.SAMRecordIterator;
import htsjdk.samtools.SamReader;
import htsjdk.samtools.SamReaderFactory;

public class Biostar256909 {
public static void main(String[] args) throws Exception{
    long n=0L;
    SamReader sr= SamReaderFactory.makeDefault().open(new File(args[0]));
    SAMRecordIterator iter=sr.query("chr10",18000,45500,false);
    while(iter.hasNext())
    {
        final SAMRecord rec=iter.next();
        if(!rec.getReadUnmappedFlag() && rec.getCigar().containsOperator(CigarOperator.N))
            {
            continue;
            }
        n++;
    }
    iter.close();
    sr.close();
    System.out.println(n);
}
}

compile using picard as a library:

java -cp /path/to/picard.jar Biostar256909.java

execute:

java -cp /path/to/picard.jar:. Biostar256909 input.bam
ADD COMMENT
0
Entering edit mode
7.5 years ago

Not tested, but this maybe faster:

samtools view input.bam chr10:18000-20000  | cut -f 6 | grep -v 'N' | wc -l

cut+grep maybe faster than awk.

But in practice I wonder how much difference it makes, if any...

ADD COMMENT

Login before adding your answer.

Traffic: 1492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6