Hi all, I'm a beginner in analysis and don't have any help to check my codes or ask for solutions so I'll post my questions here.
I have a chip seq data and mapped them with bowtie2. I have 100 bam files with different names. I need to write a loop for samtools to merge them into related file name and then do samstat on each merged file. If I can use GNU parallel to write the loop, it would be awesome. I have two lanes for each sample and they are single end reads. For mapping them to genome I used this code:
for i in /~/*.fastq.gz
do
bowtie2 -p 16 -k 1 --fast-local --no-unal --no-mixed -t -x hg19 -U "${i}" |samtools view -o "${i%.fastq.gz}".bam
done
The names are written like this:
First pair:
- HCT116_Input_S50_L001_R1_001.bam
- HCT116_Input_S50_L002_R1_001.bam
Another pair: 1.MCF7_K9_I_2_S8_L001_R1_001.bam 2.MCF7_K9_I_2_S8_L002_R1_001.bam
And continues till other pairs and names. For example, I want them to be HCT116_Input_S50.bam and MCF7_K9_I_2_S8.bam at the end of merging.
I'm using these standalone code:
samtools merge HCT116_input_S50.bam HCT116_Input_S50_L001_R1_001.bam HCT116_Input_S50_L002_R1_001.bam
samstat HCT116_input_S50.bam
Can someone please help me with writing the loops? Thanks a lot.
Thank you so much GenoMax for your help. I find your answer very exciting to work with but it gives me an error of basename. basename: invalid option -- 'r'
Besides, I think ${i}_L001_R1_001.bam will be: HCT116_Input_S50_L001_R1_001.bam_L001_R1_001.bam
It doesn't return HCT116_Input_S50.bam. Does it make sense?