I downloaded sequencing results (.bam file) for 27 samples from EGA. There are around 400 bam files in total, which has to be merged. I have a list with names of bam files in one column and its corresponding sample ID in other column. I would like to merge these bam files so that finally I have 27 bam files (one bam file for each samples).
For example from the table, there are 5 bam files which corresponds to 2 samples,
Untested and taken advantage of what Devon already said:
info="description.tsv" # this is the file with the correspondence filename/sample_ID
# get the sample ID
sample_names=`cut -f2 $info | sort | uniq | tr "\n" " "`
for sample in $sample_names;
do
bams=`grep $sample $info | cut -f1 | awk '{print $1".bam"}' | tr "\n" " "` # remove the awk statment if the suffix is already present
echo "aligments for $sample are $bams"
echo samtools merge $bams
done
# remove the echo in the 3rd line of the loop is testing is successful.
You could also use wild cards, most likely. Of course, it's likely that there's a bunch of information that you've not mentioned about why the two simple commands that I wrote won't suffice...
So what is the question? Please frame your question properly and what have to tried to achieve it to get good answers.
Sorry for not being clear.
I downloaded sequencing results (.bam file) for 27 samples from EGA. There are around 400 bam files in total, which has to be merged. I have a list with names of bam files in one column and its corresponding sample ID in other column. I would like to merge these bam files so that finally I have 27 bam files (one bam file for each samples).
For example from the table, there are 5 bam files which corresponds to 2 samples,
AD_1
>SC_R2
+SC_R3
AD_2
>SC_C1
+SC_C2
+SC_C3
Hope I conveyed the message.