Loop for merging multiple BAM files from multiple lanes with different names into related names/ samstat
2
0
Entering edit mode
3.1 years ago
Farzaneh • 0

Hi all, I'm a beginner in analysis and don't have any help to check my codes or ask for solutions so I'll post my questions here.

I have a chip seq data and mapped them with bowtie2. I have 100 bam files with different names. I need to write a loop for samtools to merge them into related file name and then do samstat on each merged file. If I can use GNU parallel to write the loop, it would be awesome. I have two lanes for each sample and they are single end reads. For mapping them to genome I used this code:

for i in /~/*.fastq.gz
do
bowtie2 -p 16 -k 1 --fast-local --no-unal --no-mixed -t -x hg19 -U "${i}" |samtools view -o "${i%.fastq.gz}".bam
done

The names are written like this:

First pair:

  1. HCT116_Input_S50_L001_R1_001.bam
  2. HCT116_Input_S50_L002_R1_001.bam

Another pair: 1.MCF7_K9_I_2_S8_L001_R1_001.bam 2.MCF7_K9_I_2_S8_L002_R1_001.bam

And continues till other pairs and names. For example, I want them to be HCT116_Input_S50.bam and MCF7_K9_I_2_S8.bam at the end of merging.

I'm using these standalone code:

samtools merge HCT116_input_S50.bam HCT116_Input_S50_L001_R1_001.bam HCT116_Input_S50_L002_R1_001.bam

samstat HCT116_input_S50.bam

Can someone please help me with writing the loops? Thanks a lot.

samtools chipseq samstat • 2.6k views
ADD COMMENT
0
Entering edit mode
3.1 years ago
GenoMax 147k

Here is one way to do this:

We are starting with these files

$ ls -1 *.bam
HCT116_Input_S50_L001_R1_001.bam
HCT116_Input_S50_L002_R1_001.bam
HCT117_Input_S51_L001_R1_001.bam
HCT117_Input_S51_L002_R1_001.bam

Now to write the loop.

$ for i in `ls -1 *L001*.bam`; do name=$(basename ${i} _L001_R1_001.bam); echo samtools merge ${name}.bam ${name}_L001_R1_001.bam ${name}_L002_R1_001.bam; echo samstat ${name}.bam;done

You should see the commands printed out on your screen.

samtools merge HCT116_Input_S50.bam HCT116_Input_S50_L001_R1_001.bam HCT116_Input_S50_L002_R1_001.bam
samstat HCT116_Input_S50.bam
samtools merge HCT117_Input_S51.bam HCT117_Input_S51_L001_R1_001.bam HCT117_Input_S51_L002_R1_001.bam
samstat HCT117_Input_S51.bam

Remove the word echo to actually execute the commands when they look correct.

ADD COMMENT
0
Entering edit mode

Thank you so much GenoMax for your help. I find your answer very exciting to work with but it gives me an error of basename. basename: invalid option -- 'r'

Besides, I think ${i}_L001_R1_001.bam will be: HCT116_Input_S50_L001_R1_001.bam_L001_R1_001.bam

It doesn't return HCT116_Input_S50.bam. Does it make sense?

ADD REPLY
0
Entering edit mode
3.1 years ago
1.MCF7_K9_I_2_S8_L001_R1_001.bam  2.MCF7_K9_I_2_S8_L001_R1_001.bam

Another pair has same names? Do they differ in Lane numbers as for the first pair?

$ parallel --dry-run  'samtools merge -o {=s/_L001_R1_001.bam//=}.bam {} {=s/001/002/=} && samtools stats {=s/_L001_R1_001.bam//=}.bam' ::: *_L001*.bam

samtools merge -o HCT116_Input_S50.bam HCT116_Input_S50_L001_R1_001.bam HCT116_Input_S50_L002_R1_001.bam && samtools stats HCT116_Input_S50.bam
samtools merge -o MCF7_K9_I_2_S8.bam MCF7_K9_I_2_S8_L001_R1_001.bam MCF7_K9_I_2_S8_L002_R1_001.bam && samtools stats MCF7_K9_I_2_S8.bam
ADD COMMENT
0
Entering edit mode

Thank you so much for your help. This does work but with a bit of editing. Removing the -o and also I'm using samstat not samtools stat. So, I exchanged that. If we write it like this:

parallel --eta 'samtools merge {=s/_L001_R1_001.bam//=}.bam {} {=s/001/002/=} && samstat {=s/_L001_R1_001.bam//=}.bam' ::: *_L001*.bam

It also shows the time remained.

ADD REPLY
0
Entering edit mode

It stopped working for other files :( It gives an error of the GNU parallel. It starts samstat before finishing merging files. Basically, it says that the merged files doesn't exist so I can't perform samstat on that.

The only common part between my files are _R1_001.bam Here is another pair of my files:

  1. HCT116_K9_R_1_S35_L001_R1_001.bam
  2. HCT116_K9_R_1_S35_L002_R1_001.bam

This is the loop I wrote and works, but I prefer it to be in parallel, have the name as I mentioned and include samstat in the loop. I liked your code but it doesn't work for others...

mkdir merged
for L1 in *_L001_*.bam
do
echo $L1
L2=`echo $L1 | sed 's/_L001_/_L002_/'`
merged=`echo $L1 | sed 's/_L001_/_merged_/'`
samtools merge ./merged/${merged} ${L1} ${L2}
done
ADD REPLY

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6