Question

How to quickly count the number of reads at a specific coordinate in a bam file

0

Entering edit mode

8.1 years ago

klin • 0

Hi all,

I am trying to count the number of reads (or alignments) for specific genomic locations in a bam file. I don't want reads with skipped region from the reference. I am using samtools and awk to do it now, but I hope to do it quicker.

samtools view input.bam chr10:18000-20000  | awk '($6 !~ /[N*]/)' | wc -l

Is there a way to make this process faster?

Instead of run samtools+awk for every location, should I just go through the bam files once? Would the alternative be faster?

Thanks, Woody

RNA-Seq sequencing SNP • 6.4k views

ADD COMMENT • link updated 8.1 years ago by dariober 15k • written 8.1 years ago by klin • 0

score 1 · Answer 1 · 2017-06-08

1

Entering edit mode

8.1 years ago

Chris Miller 22k

bam-readcount does this: https://github.com/genome/bam-readcount

ADD COMMENT • link 8.1 years ago by Chris Miller 22k

score 0 · Answer 2 · 2017-06-08

usingt htsjdk (not tested):

import java.io.File;
import htsjdk.samtools.CigarOperator;
import htsjdk.samtools.SAMRecord;
import htsjdk.samtools.SAMRecordIterator;
import htsjdk.samtools.SamReader;
import htsjdk.samtools.SamReaderFactory;

public class Biostar256909 {
public static void main(String[] args) throws Exception{
    long n=0L;
    SamReader sr= SamReaderFactory.makeDefault().open(new File(args[0]));
    SAMRecordIterator iter=sr.query("chr10",18000,45500,false);
    while(iter.hasNext())
    {
        final SAMRecord rec=iter.next();
        if(!rec.getReadUnmappedFlag() && rec.getCigar().containsOperator(CigarOperator.N))
            {
            continue;
            }
        n++;
    }
    iter.close();
    sr.close();
    System.out.println(n);
}
}

compile using picard as a library:

java -cp /path/to/picard.jar Biostar256909.java

execute:

java -cp /path/to/picard.jar:. Biostar256909 input.bam

score 0 · Answer 3 · 2017-06-08

0

Entering edit mode

8.1 years ago

dariober 15k

Not tested, but this maybe faster:

samtools view input.bam chr10:18000-20000  | cut -f 6 | grep -v 'N' | wc -l

cut+grep maybe faster than awk.

But in practice I wonder how much difference it makes, if any...

ADD COMMENT • link 8.1 years ago by dariober 15k