Entering edit mode
4.4 years ago
evelyn
▴
230
Hello,
I have aligned multiple fastq files using bwa-mem
and got .sorted.bam
files:
INPUT_DIR=/path/trimmed
OUTPUT_DIR=/path/result
INPUT_FILE_ONE=$(ls -1 $INPUT_DIR/*_R1_paired.fastq.gz | sed -n ${RUN}p)
SAMPLE=$(basename "$INPUT_FILE_ONE" _R1_paired.fastq.gz)
bwa mem genome.fasta ${INPUT_DIR}/${SAMPLE}_R1_paired.fastq.gz ${INPUT_DIR}/${SAMPLE}_R2_paired.fastq.gz > ${OUTPUT_DIR}/${SAMPLE}_paired_bwa.sam
samtools view -S -b ${OUTPUT_DIR}/${SAMPLE}_paired_bwa.sam > ${OUTPUT_DIR}/${SAMPLE}_paired_bwa.bam
samtools sort ${INPUT_DIR}/${SAMPLE}_paired_bwa.bam -o ${OUTPUT_DIR}/${SAMPLE}_paired_bwa.sorted.bam
samtools index ${OUTPUT_DIR}/${SAMPLE}_paired_bwa.sorted.bam
Then I tried using Picard
to remove duplicates from sorted.bam
files using:
INPUT_DIR=/path/result
OUTPUT_DIR=/path/Duplicate_marking_picard
INPUT_FILE_ONE=$(ls -1 $INPUT_DIR/*_paired_bwa.sorted.bam | sed -n ${RUN}p)
SAMPLE=$(basename "$INPUT_FILE_ONE" _paired_bwa.sorted.bam)
echo "RUN #${RUN} with sample ${SAMPLE}"
java -Xms1g -Xmx3g -jar picard.jar MarkDuplicates \
I=${INPUT_DIR}/${SAMPLE}_paired_bwa.sorted.bam \
O=${OUTPUT_DIR}/${SAMPLE}_picard.sorted.bam \
M=${OUTPUT_DIR}/${SAMPLE}_metrics.txt \
TMP_DIR=`pwd`/tmp
However, after resulting in .picard.sorted.bam and metrics.txt files for input files, it started resulting in only .picard.sorted.bam and no metrics.txt file. When I checked the log files for such cases, it gives a long message including this error:
Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: Write error; BinaryCodec in writemode; streamed file (filename not available)
Caused by: java.io.IOException: Disk quota exceeded
I tried again but I still got the same error after getting results for some files. Thank you for the help!
This has nothing to do with Picard, you ran out of space assigned to you by the system administrators. Either you have to ask for more space, or delete unneeded files.
You could save a lot space by streaming the bwa mapping directly into samtools:
Thank you! I have checked our group's space and we still have enough space left. We did not get any space related notification which we usually get once we use 90% of our assigned space. I needed
.sam
files that's why I did not choose to skip that.I am wondering if it has to do with the same
TMP_DIR
in picard command. I just checked and theTMP_DIR
is empty and I am not sure if I have used it correctly in picard command line.TMP_DIR
may indeed be empty when there is no job running. You can re-start the job and watch that directory. You may have a separate quota on the directory where you haveTMP_DIR
.Thank you! It was empty while the job was running. That's why I am wondering if my code is correct to use
TMP_DIR
. How can I assign a separate quota on this directory.If that did not work then you could try
-Djava.io.tmpdir=/directory_path
.BTW did you make a directory called
tmp
in your working directory when usingpwd
/tmp.Thank you! Yes, I made a directory called
tmp
in my working directory. I will try your suggestion.This command did not work. I get the same error after some samples.
can you post
quota -s
command result?Our group quota is 11118938420 kbytes used out of 13958643712 kbytes available. I submitted the job by increasing the memory. I am not sure if it will work.