Hi all,
I submitted the following job star_align.sh) using slurm to align a fastq read to a reference genome that I generated using GenCode v30:
STAR --runThreadN 16 --readFilesCommand zcat --quantMode GeneCounts --genomeDir ~/directory/to/genome/ --readFilesIn ~/directory/to/file.fastq.gz
Here is the slurm submission script that I've used to submit the job:
#SBATCH --job-name=star_align.sh
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem-per-cpu=8G
#
srun star_align.sh
srun sleep 60
And this is the output that I see:
Jul 08 13:07:23 ..... Started STAR run
Jul 08 13:07:23 ..... Loading genome
Jul 08 13:08:11 ..... Started mapping
star_align.sh: line 2: 11554 Segmentation fault (core dumped)
Does anyone have an idea about what might be going wrong here? Thanks in advance
Difficult to tell. Please make a subset of the fastq files (maybe 1000 reads) and then align it with your script and the identical command. That will help to see if it is a general problem or rather a memory issue.
Does GenCode make a proper STAR index? What files do you have there besides the genome itself? And are you sure you don't want to be including a gtf file in there?
Sorry, I meant to say that I used GenCode files (fasta, gtf) to generate the genome, which I did using STAR.
Trying the alignment again with a subset of the first 1000 reads, I get the following error message:
Okay, so something wrong with the fastq. What do the first 10 lines look like? If the first 10 lines look fine, maybe the fastq is garbled further down.
Here are the first 10 lines:
Those look okay, but you can confirm by making a baby fastq with just those two sequences, see if that runs.
How did you make the subset? Do something like
zcat your.fastq.gz | head -n 4000 > subset.fq
and be sure that you only use multipliers of 4 as a fastq read consists of four lines.OK, that was definitely the issue with the subset of 1000 that I generated. Trying again, I got the same error as I did with the original fastq read. This seems to indicate that the issue isn't with memory usage.
I would recommend against using that option.
STAR
needs at least 30G of RAM for human sized genomes. So allocate more RAM to the entire job using#SBATCH --mem=40g
option.Thanks for the recommendation, I will be sure to increase RAM to 40G for future job submissions.. However, I'm still getting the same error for this alignment, even after setting it to 40G of RAM.
Did you make these indexes with the version of
STAR
currently installed on your cluster? There was no error during that process? Have they been tested and are known to work?Yes, I generated the indexes with the same version of
STAR
and there was no error during the process. However, this alignment is my first attempt to test them, so they are not known to work. Is there any standard test that I can do to make sure there's no problem with my indexes?Can you post the command used to generate STAR index?
Compare the files your have in your
STAR
index with the following listing to make sure you have most of these files in your index and they are of similar size (usedu -shc *
to determine file sizes).Command used to generate
STAR
index:Output of
du -shc *
Those files look to be of close enough in size but I don't see a
Log.out
file. Did you save it somewhere else? It should say something like this at the end.Are you using a pre-compiled version of
STAR
or did the admins of this cluster compile/install from source?Yes, I saved it somewhere else, and it has a very similar ending to yours. I'm using a pre-compiled version of
STAR
(version 2.7.1a).Yes I agree @genomax because I faced same issue when I was trying to submit my batch script on server. In my case I used 31 GB for human genome.