Merge .bam files by groups of lanes
1
2
Entering edit mode
6.4 years ago
elb ▴ 260

Hi guys, I have a folder with around 300 .bam files. Each .bam file is a lane of a sample and hence 4 lanes make a sample. I would like to merge the .bam files of the four lanes in a single one by _*S*_ where S is followed by a number that represent the number of the sample (e.g. my_experimet_xxx__L001_S1_stimulated_Aligned.bam). Suppose I have 75 samples, i.e. ${1...75}.

Can anyone help me please?

The line I use to merge normally is the following:

samtools merge S1_merged.bam *bam

Thank you in advance

bam samtools RNA-Seq • 3.2k views
ADD COMMENT
1
Entering edit mode

This is not a job posting. The "job" tage should be used for jobs like career jobs. Most posts are about a computational "job" so I think this is implicit. Just use tags related to the type of computational job you want help with.

ADD REPLY
0
Entering edit mode

drkennetz is correct. Plus, the type says "Job Ad", not "job". Please be more mindful in the future, elb.

ADD REPLY
0
Entering edit mode

*bam will select all files, including your output bam. Maybe use a different glob pattern or the -b option?

ADD REPLY
0
Entering edit mode

Try on few files ( edit seq accordingly):

 $ parallel --dry-run 'samtools merge  my_experimet_xxx___S{}_stimulated_Aligned.bam  my_experimet_xxx__L00{1..4}_S{}_stimulated_Aligned.bam' ::: $(seq 1 75)

input format:

output format: my_experimet_xxx___S{1..75}_stimulated_Aligned.bam

input format: my_experimet_xxx__L00{1..4}_S{1..75}_stimulated_Aligned.bam

First check if samtools supports bash string extension on your machine: something like: samtools merge my_experimet_xxx___S75_stimulated_Aligned.bam my_experimet_xxx__L00{1..4}_S75_stimulated_Aligned.bam

ADD REPLY
7
Entering edit mode
6.4 years ago
drkennetz ▴ 560

I think this should work for your issue:

create a file name samtools_merge.sh

$mkdir merged

for L1 in *_L001_*.bam
do
    echo $L1
    L2=`echo $L1 | sed 's/_L001_/_L002_/'`
    L3=`echo $L1 | sed 's/_L001_/_L003_/'`
    L4=`echo $L1 | sed 's/_L001_/_L004_/'`
    merged=`echo $L1 | sed 's/_L001_/_merged_/'`
    samtools merge ./merged/${merged} ${L1} ${L2} ${L3} ${L4}
done

This will iterate over each unique sample with L001 somewhere in the name and store other variables by replacing L001 with L002,003,004, and do this for each sample. Then it will run samtools merge on all 4 lanes, then do the same for the next sample until it has gone through all the samples. The filename output will be the same as the sample name, but will substitute lane information with "merged".

Just run this in your directory with all the bams and you should have merged bams in the dir "merged".

ADD COMMENT
0
Entering edit mode

It is fantastic! It works perfectly. Thank you very very much!

ADD REPLY
0
Entering edit mode

I am glad to hear that! It was untested so you never know.

ADD REPLY
0
Entering edit mode

Should have mentioned the "untested" part in the post, drkennetz.

ADD REPLY

Login before adding your answer.

Traffic: 1853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6