I have the following script and I need to add a command to remove pcr duplicates from samples. I think I need to add the command samtools markdup -r ...
after the sorting command and before the indexing command, but I don't really know how I should right the input and output in the command. It's a little bit confusing to me. Would really appreciate if someone could help.
Script:
for sample in *.sam
do
echo $sample
var=$(echo ${sample} | sed 's/.sam//')
echo $var
# Convert file from SAM to BAM format
samtools view -Sb $sample > ${var}.uns.bam
# Sort BAM file
samtools sort -T /tmp/$var.sorted -o $var.bam $var.uns.bam
# index the bam file
samtools index ${var}.bam
# Remove intermediate files
rm ${var}.uns.bam
done
note also: to use markdup: "The input file must be coordinate sorted and must have gone through fixmates with the mate scoring option on." you don't mention having run fixmates before so make sure that is done
Thank you very much, that was really helpful and you are absolutely right, i totally forgot to run fixmates.