Can I Access Bam File Information Without Converting To Sam?
6
3
Entering edit mode
13.4 years ago
Bioscientist ★ 1.7k

Right now I need to grab some information from lines of SAM/BAM files. Well, I know we cannot directly view BAM files as SAM.

But do we have any ways of accessing the information of BAM without converting it to SAM? You know, SAM is three times larger.

thx

bam sam • 19k views
ADD COMMENT
3
Entering edit mode

Pipe. Practically we almost never decompress BAM or a gzip file.

ADD REPLY
0
Entering edit mode

what information are you trying to grab?

ADD REPLY
0
Entering edit mode

Distance between two mapped reads.

ADD REPLY
0
Entering edit mode

That presumes that parsing SAM is easier than BAM. But that's only true if you're stuck with a poor scripting language.

ADD REPLY
4
Entering edit mode
13.4 years ago

the following command line should print the distance (5' to 5', in the 1st column) between two mapped reads:

samtools view -f 2 path/to/your.bam |\
awk -F '    ' 'BEGIN  {
      pp=-1;
      pc="";
      }

     {
     if($3!=pc) { pc=$3; pp=int($4);next;}
     if(pp==-1) {pp=int($4);next;}
     printf("%d\t%s\n",int($4)-pp,$0);
     pp=int($4);
     }'

UPDATE: The following C program should print the distance between two mapped reads without any conversion to SAM:

/*
Motivation:
    distance between mapped reads
Author:
    Pierre Lindenbaum PhD
WWW:
    http://plindenbaum.blogspot.com
Contact:
    plindenbaum@yahoo.fr
Compilation:
    gcc -Wall -I $(SAMDIR) -L $(SAMDIR) prog.c -lbam -lz
*/
#include <stdio.h>
#include "bam.h"
#include "sam.h"

static void bamdistance(const char* filename)
 {
 /** previous genomic position in the reference */
 int32_t prev_pos=-1;
 /** previous reference index */
 int32_t prev_reference=-1;
 samfile_t *fp_in = NULL;
 bam1_t *b=NULL;

 fp_in = samopen(filename, "rb", 0);
 if(NULL == fp_in)
      {
      fprintf(stdout,"Could not open file.\n");
      return;
      }
 b = bam_init1();
 while(samread(fp_in, b) > 0) /* loop over the records */
    {
    if(b->core.tid<0 || b->core.pos<0 || (b->core.flag & BAM_FPROPER_PAIR)==0)
            {
            //ignore
            }
       else if(b->core.tid != prev_reference)
        {
        prev_reference=b->core.tid;
        prev_pos=b->core.pos;
        }
        else
            {
            printf("%d\n",(b->core.pos - prev_pos));
            prev_pos=b->core.pos;
            }
      bam_destroy1(b);
      b = bam_init1();
      }
 bam_destroy1(b);
 samclose(fp_in);
 }

int main(int argc, char *args[])
  {
  int optind=1;
  if(optind==argc)
      {
      fprintf(stderr,"Illegal number of arguments\n");
      return EXIT_FAILURE;
      }
  /* loop over the files */
  while(optind
ADD COMMENT
0
Entering edit mode

use -f66 (read mapped in proper pair & first in pair), so each pair is printed only once

ADD REPLY
3
Entering edit mode
13.4 years ago
Pasta ★ 1.3k

You can use SAMTOOLS. Using the command line:

samtools view MyFile.bam

will allow you to read your BAM file as a SAM without decompressing. Then, you can pipe this command and use grep/awk or whatever. For example:

samtools view -X MyFile.bam | less -S
ADD COMMENT
1
Entering edit mode
13.4 years ago
Vitis ★ 2.6k

I always use Bio::DB::Sam modules to access bam files. Although the name is 'sam', they provide very flexible functions to access almost everything from the bam files, too. The downside is that you have to know a bit perl programming to do your queries, but it's definitely worthwhile to learn. Plus, the manual page is very well-written and with a lot useful examples: http://search.cpan.org/~lds/Bio-SamTools/lib/Bio/DB/Sam.pm

ADD COMMENT
1
Entering edit mode
13.4 years ago
Arun 2.4k

It does help to know what you want it for. And of course perl and Bio::DB::Sam are certainly irreplaceable. However, you could also try samtools view (and maybe tview) option to output a bam file on to the screen (and pipe to a file). There seems to be an option for specifying Chromosome, start and end values as well but I haven't tried that part. http://samtools.sourceforge.net/samtools.shtml

ADD COMMENT
0
Entering edit mode
13.3 years ago
Marvin ▴ 900

The format of BAM is documented in the same document as SAM, currently http://samtools.sourceforge.net/SAM1.pdf. Without your being more specific, the only answer I can give is "Write a binary parser for BAM. It's not that friggin' hard."

I may have more detailed answers if you specify what information you look for, what language you're using, or what it is you didn't understand about the BAM documentation.

ADD COMMENT

Login before adding your answer.

Traffic: 2068 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6