Entering edit mode
14 months ago
Y
▴
10
I have the following inputs:
# Define input directory containing FASTQ files
Input_directory="/path/to/fastq/folder"
# Define output directory for STAR output files
Output_directory="/path/to/output/directory"
# Define paths to reference files
Annotation_GTF="/path/to/Zebra/fish/GRCz11.110.chr.gtf"
Genome_FASTA="/path/to/soft/masked/Zebra/fish/primary_assembly.fa"
Reference="/path/to/soft/masked/STAR/created/reference/only/for/use/with/STAR"
# Define the number of threads to use
num_threads=4
To this script:
# Loop through each pair of paired FASTQ files in the input directory and subdirectories
for forward_file in "${Input_directory}"/*_R1.fq; do
# Extract the file name without extension
file_name=$(basename "${forward_file}" _R1.fq)
# Extract the sample name from the file name
sample_name="${file_name/_R1/}"
# Path to the corresponding reverse FASTQ file
reverse_file="${forward_file/_R1/_R2}"
echo "Forward File: ${forward_file}"
echo "Reverse File: ${reverse_file}"
echo "Output Directory: ${Output_directory}"
# Create a unique temporary directory for this sample
TMPDIR="${Output_directory}/${file_name}___STAR_temporary_directory"
echo "The temporary directory is: ${TMPDIR}"
# Change working directory to the output directory
cd "${Output_Directory}"
echo "Made temporary directory: ${TMPDIR}"
# Run STAR alignment
STAR \
--genomeDir "${Reference}" \
--readFilesIn "${forward_file}" "${reverse_file}" \
--outFileNamePrefix "${Output_directory}/${sample_name}_Soft___" \
--runThreadN "${num_threads}" \
--genomeLoad NoSharedMemory \
--outSAMtype BAM SortedByCoordinate \
--outTmpDir "${TMPDIR}" \
--outStd Log \
--outSAMunmapped Within \
--outSAMattributes Standard \
--outSAMstrandField intronMotif \
--sjdbGTFfile "${Annotation_GTF}" \
--genomeFastaFiles "${Genome_FASTA}"
done
But I get the error:
line 65: 37898 Segmentation fault (core dumped) STAR --genomeDir "${Reference}" --readFilesIn "${forward_file}" "${reverse_file}" --outFileNamePrefix "${Output_directory}/${sample_name}_Soft" --runThreadN "${num_threads}" --genomeLoad NoSharedMemory --outSAMtype BAM SortedByCoordinate --outTmpDir "${TMPDIR}" --outStd Log --outSAMunmapped Within --outSAMattributes Standard --outSAMstrandField intronMotif --sjdbGTFfile "${Annotation_GTF}" --genomeFastaFiles "${Genome_FASTA}"
Line 65 is the start of the for loop but I cannot seem to find any error there.
This happens when I use STAR version 2.7.11a and STAR version STAR/2.7.10b. Why is this occurring?
When STAR crashes it is usually due to its excessive memory demands. First, remove
--outSAMtype BAM SortedByCoordinate
and change that to output an unsorted file. This will massively decrease memory. If you need sorted bam then later usesamtools sort
. How much memory is available?I give it 90g of vmem. It should be enough just for less than 15 bams.
If you need a sorted BAM (which in 99% cases you will not), always use
--limitBAMsortRAM
and set it to less than your vmem (say, 75G in your case). But like ATPoint says, it's best to output unsorted BAM and then use samtools sort followed by whatever. Even if you need wiggle/bedgraph files, usealignReads
mode with unsorted BAM, sort+index the BAM then useinputAlignmentsFromBAM
mode with the sorted+indexed BAM file.I did what Ram suggested and limited the Ram suggested and added
--limitBAMsortRAM 75000000000
but then I getEXITING because of fatal ERROR: could not make temporary directory: /path/to/temporary/directory/ SOLUTION: (i) please check the path and writing permissions
Are you starting these jobs in parallel with that simple
for
loop? Looks like there are 15 samples? Each of those jobs is going to try and use memory you have so that is the reason you are running out of RAM. Sorting the files afterwards would be an efficient operation as has been suggested.I don't believe that they are in parallel. Its just a for loop.
Is that really the path or did you camourflage it?
I work on an HPC and I cannot share paths this is why it is not written there.
Then say this upfront...
Anyway, the error is clear, makee sure you can write this location and have mkdir permissions.
For the run with the
EXITING because of fatal ERROR: could not make temporary directory
I had used2.7.11a
and I know some versions have difficultly making the temporary directory due to issues with the STAR software so I am doing a run with STAR version2.7.10b
to see.I still get the error:
I am not sure why this is occurring.
You're running out of memory. I've never done this, but try using a better
--genomeLoad
option. See this post: STAR genomeLoad issueYou will need to figure out how to implement the load-genome-before-looping-over-samples step described by Devon.
I will try and figure it out on my own given what you all have mentioned. Thank you for your time.