Entering edit mode
2.0 years ago
Eliveri
▴
350
I have a process which is in gatk, I would like to convert partially to samtools, however I am unsure of if there is a Samtools equivalent of the CleanSam process?
bwa mem -t 12 -M -R "..." $ref ${paired_reads} > ${pair_id}.sam
gatk --java-options "-Xmx16g -Xms16g" SamFormatConverter -R $ref -I ${pair_id}.sam -O ${pair_id}.bam
gatk --java-options "-Xmx16g -Xms16g" CleanSam -R $ref -I ${pair_id}.bam -O ${pair_id}.clean.bam
gatk --java-options "-Xmx16g -Xms16g" SortSam -R $ref -I ${pair_id}.clean.bam -O ${pair_id}.sorted.bam -SO coordinate --CREATE_INDEX true TMP_DIR=`pwd`/tmp
gatk --java-options "-Xmx16g -Xms16g" MarkDuplicates -R $ref -I ${pair_id}.sorted.bam -O ${pair_id}.sorted.dup.bam -M ${pair_id}_dup_metrics.txt -ASO coordinate
Tentatively ... skipping .sam and going straight to .bam
bwa mem -t 12 -M -k 25 \
-R "@RG\\tID:${pair_id}\\tLB:${pair_id}\\tPL:illumina\\tSM:${pair_id}\\tPU:${pair_id}" \
$ref ${paired_reads} | samtools sort -@ 12 -o "${pair_id}.bam" \
&& samtools index "${pair_id}.bam"
However, I could not find a "clean" samtools function to run before indexing and sorting.
question is : why do you need cleansam ?
I am writing a collaborator's shell script into nextflow, so I am trying to match it as closely as possible.
Quick note: It's picard, not piccard.