How to generate a .sam/.bam file of a particular chromosome or even a particular region?
3
2
Entering edit mode
10.0 years ago
Chenglin ▴ 260

I wanted to see a particular sequence in our re-sequencing data using IGV. However the .bam file is too big (10Gb) to handle. Since I am interested in only a small region in the genome, I wondering if there is any way to generate a .sam/.bam file only for that region. Thank you very much in advance!

Chenglin

alignment genome next-gen • 11k views
ADD COMMENT
5
Entering edit mode
10.0 years ago

This has been answered many times before. You can specify a small region of your interest and create a small bam file as shown.

samtools view -bh chr1:100-200 > small.bam
ADD COMMENT
2
Entering edit mode

FYI, -b will always write the header if it's present (this isn't documented, but see here), so samtools view -b chr1:100-200 > small.bam will do the same thing.

ADD REPLY
0
Entering edit mode

Thanks Devon. Didn't know that.

ADD REPLY
0
Entering edit mode

Thank you very much for your answer. Do I need to put the reference genome index file (My_genome.fasta.fai) in the command? Actually I typed something like this samtools view -bh chr1:100-200 My_genome.fasta.fai small.sam > small.bam, but I got error information [main_samview] fail to open "My_genome.fasta.fai" for reading. My index file is there, why can't it be read? Thanks!

ADD REPLY
0
Entering edit mode

You won't need the fai file at all.

The syntax is:

samtools view [OPTIONS] sorted_and_indexed.bam region

Edit: Just to make things clearer, in your case the command would be samtools view -b small.bam chr1:100-200 > small.subset.bam. The input must be a sorted and indexed BAM or CRAM file.

ADD REPLY
0
Entering edit mode

Thank you very much for your help. I tried and made it.

ADD REPLY
0
Entering edit mode

Thank you, Ashutosh. I just made it.

ADD REPLY
1
Entering edit mode
6.6 years ago
gsr9999 ▴ 310

You can also use sambamba tool to slice the existing bam to specific regions. It is much faster than samtools.

Example : $sambamba slice -o my_output_bam_file.bam Input_bam_file.bam chr1:100-200

Or you can extract reads for an entire chromosome by just specifying the chromosome name in option

Example : $sambamba slice -o chr1_bam_file.bam Input_bam_file.bam chr1

ADD COMMENT
1
Entering edit mode
3.4 years ago
etiennedanis ▴ 20
samtools view -b in.bam chr1:100000-200000 > out.bam

did not work for me.

But

samtools view -b in.bam 1:100000-200000 > out.bam

worked. It depends on the genome used for the alignment. Using

samtools idxstats in.bam

can be useful to figure out how to name the chromosomes.

ADD COMMENT

Login before adding your answer.

Traffic: 1833 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6