Question

Loop for merging multiple BAM files from multiple lanes with different names into related names/ samstat

0

Entering edit mode

3.1 years ago

Farzaneh • 0

Hi all, I'm a beginner in analysis and don't have any help to check my codes or ask for solutions so I'll post my questions here.

I have a chip seq data and mapped them with bowtie2. I have 100 bam files with different names. I need to write a loop for samtools to merge them into related file name and then do samstat on each merged file. If I can use GNU parallel to write the loop, it would be awesome. I have two lanes for each sample and they are single end reads. For mapping them to genome I used this code:

for i in /~/*.fastq.gz
do
bowtie2 -p 16 -k 1 --fast-local --no-unal --no-mixed -t -x hg19 -U "${i}" |samtools view -o "${i%.fastq.gz}".bam
done

The names are written like this:

First pair:

HCT116_Input_S50_L001_R1_001.bam
HCT116_Input_S50_L002_R1_001.bam

Another pair: 1.MCF7_K9_I_2_S8_L001_R1_001.bam 2.MCF7_K9_I_2_S8_L002_R1_001.bam

And continues till other pairs and names. For example, I want them to be HCT116_Input_S50.bam and MCF7_K9_I_2_S8.bam at the end of merging.

I'm using these standalone code:

samtools merge HCT116_input_S50.bam HCT116_Input_S50_L001_R1_001.bam HCT116_Input_S50_L002_R1_001.bam

samstat HCT116_input_S50.bam

Can someone please help me with writing the loops? Thanks a lot.

samtools chipseq samstat • 2.6k views

ADD COMMENT • link 3.1 years ago by Farzaneh • 0

score 0 · Answer 1 · 2021-10-19

Here is one way to do this:

We are starting with these files

$ ls -1 *.bam
HCT116_Input_S50_L001_R1_001.bam
HCT116_Input_S50_L002_R1_001.bam
HCT117_Input_S51_L001_R1_001.bam
HCT117_Input_S51_L002_R1_001.bam

Now to write the loop.

$ for i in `ls -1 *L001*.bam`; do name=$(basename ${i} _L001_R1_001.bam); echo samtools merge ${name}.bam ${name}_L001_R1_001.bam ${name}_L002_R1_001.bam; echo samstat ${name}.bam;done

You should see the commands printed out on your screen.

samtools merge HCT116_Input_S50.bam HCT116_Input_S50_L001_R1_001.bam HCT116_Input_S50_L002_R1_001.bam
samstat HCT116_Input_S50.bam
samtools merge HCT117_Input_S51.bam HCT117_Input_S51_L001_R1_001.bam HCT117_Input_S51_L002_R1_001.bam
samstat HCT117_Input_S51.bam

Remove the word echo to actually execute the commands when they look correct.

score 0 · Answer 2 · 2021-10-20

0

Entering edit mode

3.1 years ago

cpad0112 21k

1.MCF7_K9_I_2_S8_L001_R1_001.bam  2.MCF7_K9_I_2_S8_L001_R1_001.bam

Another pair has same names? Do they differ in Lane numbers as for the first pair?

$ parallel --dry-run  'samtools merge -o {=s/_L001_R1_001.bam//=}.bam {} {=s/001/002/=} && samtools stats {=s/_L001_R1_001.bam//=}.bam' ::: *_L001*.bam

samtools merge -o HCT116_Input_S50.bam HCT116_Input_S50_L001_R1_001.bam HCT116_Input_S50_L002_R1_001.bam && samtools stats HCT116_Input_S50.bam
samtools merge -o MCF7_K9_I_2_S8.bam MCF7_K9_I_2_S8_L001_R1_001.bam MCF7_K9_I_2_S8_L002_R1_001.bam && samtools stats MCF7_K9_I_2_S8.bam

ADD COMMENT • link 3.1 years ago by cpad0112 21k

0

Entering edit mode

Thank you so much for your help. This does work but with a bit of editing. Removing the -o and also I'm using samstat not samtools stat. So, I exchanged that. If we write it like this:

parallel --eta 'samtools merge {=s/_L001_R1_001.bam//=}.bam {} {=s/001/002/=} && samstat {=s/_L001_R1_001.bam//=}.bam' ::: *_L001*.bam

It also shows the time remained.

ADD REPLY • link 3.1 years ago by Farzaneh • 0

0

Entering edit mode

It stopped working for other files :( It gives an error of the GNU parallel. It starts samstat before finishing merging files. Basically, it says that the merged files doesn't exist so I can't perform samstat on that.

The only common part between my files are _R1_001.bam Here is another pair of my files:

HCT116_K9_R_1_S35_L001_R1_001.bam
HCT116_K9_R_1_S35_L002_R1_001.bam

This is the loop I wrote and works, but I prefer it to be in parallel, have the name as I mentioned and include samstat in the loop. I liked your code but it doesn't work for others...

mkdir merged
for L1 in *_L001_*.bam
do
echo $L1
L2=`echo $L1 | sed 's/_L001_/_L002_/'`
merged=`echo $L1 | sed 's/_L001_/_merged_/'`
samtools merge ./merged/${merged} ${L1} ${L2}
done

ADD REPLY • link 3.1 years ago by Farzaneh • 0