I downloaded sequencing results (.bam file) for 27 samples from EGA. There are around 400 bam files in total, which has to be merged. I have a list with names of bam files in one column and its corresponding sample ID in other column. I would like to merge these bam files so that finally I have 27 bam files (one bam file for each samples).
For example from the table, there are 5 bam files which corresponds to 2 samples,
Untested and taken advantage of what Devon already said:
info="description.tsv"# this is the file with the correspondence filename/sample_ID# get the sample ID
sample_names=`cut -f2 $info |sort|uniq|tr"\n"" "`for sample in$sample_names;do
bams=`grep $sample $info |cut -f1 |awk'{print $1".bam"}'|tr"\n"" "`# remove the awk statment if the suffix is already presentecho"aligments for $sample are $bams"echo samtools merge $bamsdone# remove the echo in the 3rd line of the loop is testing is successful.
You could also use wild cards, most likely. Of course, it's likely that there's a bunch of information that you've not mentioned about why the two simple commands that I wrote won't suffice...
So what is the question? Please frame your question properly and what have to tried to achieve it to get good answers.
Sorry for not being clear.
I downloaded sequencing results (.bam file) for 27 samples from EGA. There are around 400 bam files in total, which has to be merged. I have a list with names of bam files in one column and its corresponding sample ID in other column. I would like to merge these bam files so that finally I have 27 bam files (one bam file for each samples).
For example from the table, there are 5 bam files which corresponds to 2 samples,
AD_1
>SC_R2
+SC_R3
AD_2
>SC_C1
+SC_C2
+SC_C3
Hope I conveyed the message.