Question

Multiple Bam To Multiple Sam

4

Entering edit mode

11.7 years ago

venks ▴ 740

Hi,

I have 100's of BAM files in a directory. Is there a way with which we can convert all the BAM files in a directory into sam files. I know that we can use

samtools view -h <path/.bam>  > <path/.sam>

to convert individual files.

Thanks

bam samtools sequencing • 7.3k views

ADD COMMENT • link updated 21 months ago by Ram 44k • written 11.7 years ago by venks ▴ 740

score 8 · Answer 1 · 2013-04-02

8

Entering edit mode

11.7 years ago

matted 7.8k

This is more of a [shell] scripting question. There are a lot of ways to do this, of varying elegance. Here's one try:

ls *bam | xargs -I {} -n 1 samtools view -h {} -o {}.sam

This will call this command on every bam file in the directory:

samtools view -h in.bam -o in.bam.sam

Out of curiosity I have to ask why you want to do this... I can't imagine any good reasons to leave the binary format behind, unless you're trying to fill up your drives.

ADD COMMENT • link 11.7 years ago by matted 7.8k

1

Entering edit mode

I am planning to use htseq-count to generate reads per gene. Do U think there is another efficient way of doing this.

Thanks in Advance

ADD REPLY • link 11.7 years ago by venks ▴ 740

1

Entering edit mode

There are a lot of tools that do this; you can search this forum for others. Extracting Read Count For Each Gene/Exon From Rna-Seq Bam Files discussion a few options, including HTseq. It also discusses bedtools multicov, which might be better for you since it works on BAMs natively and can handle multiple at once. If you want to use HTseq, since it looks like it doesn't support BAMs directly, I would pipe things in instead of writing to disk:

samtools view -h in.bam | htseq-count /dev/stdin features.gff

ADD REPLY • link 11.7 years ago by matted 7.8k

1

Entering edit mode

Sounds Good and Thank You very much for the suggestion

ADD REPLY • link 11.7 years ago by venks ▴ 740

0

Entering edit mode

Hi matted,

An extension to the original question, how would one pipe the data if we need to sort the bam file first then use htseq-count. For example I tried

samtools view -h in.bam | samtools sort /dev/stdin /dev/stdout | htseq-count -m intersection-strict -s no /dev/stdin /eatures.gtf > test.count

which does not seem to work..and gives the error [sort_blocks] fail to create file /dev/stdout.bam.

Any ideas would be much appreciated! many thanks

ADD REPLY • link 11.6 years ago by Zack • 0

0

Entering edit mode

you need to teel "view" to output BAM (option -bu ) , not SAM. "sort" needs a file suffix , not a stream/device ( /dev/stdout), replace /dev/stdin by '-'.

ADD REPLY • link 11.6 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Exactly. Also note that if a downstream tool doesn't support reading from BAM and really needs SAM you can always just use samtools view for piping.

ADD REPLY • link 11.7 years ago by Andreas ★ 2.5k

2

Entering edit mode

I am planning to use htseq-count to generate reads per gene. Do U think there is another efficient way of doing this.

Thanks in Advance

ADD REPLY • link 11.7 years ago by venks ▴ 740

Ram · Answer 2 · 2013-04-03

6

Entering edit mode

11.7 years ago

Pierre Lindenbaum 164k

Using make with option -j (num-parallel-tasks)

SAM_SOMEWHERE=$(shell find dir1/dir2 -name "*.sam")
%.bam:%.sam
    samtools view -S -b -o $@ $<     

.PHONY: all

all:$(SAM_SOMEWHERE:%.sam=%.bam) another1.bam another2.bam
    echo "Done: $^"

ADD COMMENT • link 11.7 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Oops; I just realized that you want BAM->SAM. I wrote SAM->BAM. Change the pattern according to your needs :-)

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 11.7 years ago by Pierre Lindenbaum 164k

Ram · Answer 3 · 2013-04-03

5

Entering edit mode

11.7 years ago

Sukhi Singh 11k

Using GNU Parallel, if you have free processors

# replace the number is `j` with the number of free processors
parallel -j 5 'samtools view -h {} -o {}.sam' ::: *bam

This will do the conversion parallely on different processors assigned.

ADD COMMENT • link 11.7 years ago by Sukhi Singh 11k

1

Entering edit mode

I don't have clusters. I am trying to run it in just one machine.

ADD REPLY • link 11.7 years ago by venks ▴ 740

2

Entering edit mode

Parallel can assign jobs to different cores/cpus on one machine.

ADD REPLY • link 11.7 years ago by Zhen Sun ▴ 60

1

Entering edit mode

With GNU parallel 20130222 this should use all CPUs on the local machine that are not busy:

parallel --load 100% samtools view -h {} -o {}.sam ::: *bam

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 11.6 years ago by ole.tange ★ 4.5k