STAR alignment - segmentation fault error
1
0
Entering edit mode
5.4 years ago
mgmohsen • 0

Hi all,

I submitted the following job star_align.sh) using slurm to align a fastq read to a reference genome that I generated using GenCode v30:

STAR --runThreadN 16 --readFilesCommand zcat --quantMode GeneCounts --genomeDir ~/directory/to/genome/ --readFilesIn ~/directory/to/file.fastq.gz

Here is the slurm submission script that I've used to submit the job:

#SBATCH --job-name=star_align.sh
#SBATCH --time=24:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem-per-cpu=8G
#
srun star_align.sh
srun sleep 60

And this is the output that I see:

Jul 08 13:07:23 ..... Started STAR run
Jul 08 13:07:23 ..... Loading genome
Jul 08 13:08:11 ..... Started mapping
star_align.sh: line 2: 11554 Segmentation fault      (core dumped)

Does anyone have an idea about what might be going wrong here? Thanks in advance

RNA-Seq alignment • 7.0k views
ADD COMMENT
1
Entering edit mode

Difficult to tell. Please make a subset of the fastq files (maybe 1000 reads) and then align it with your script and the identical command. That will help to see if it is a general problem or rather a memory issue.

ADD REPLY
1
Entering edit mode

Does GenCode make a proper STAR index? What files do you have there besides the genome itself? And are you sure you don't want to be including a gtf file in there?

ADD REPLY
0
Entering edit mode

Sorry, I meant to say that I used GenCode files (fasta, gtf) to generate the genome, which I did using STAR.

ADD REPLY
0
Entering edit mode

Trying the alignment again with a subset of the first 1000 reads, I get the following error message:

ReadAlignChunk_processChunks.cpp:115:processChunks EXITING because of FATAL ERROR in input reads: unknown file format: the read ID should start with @ or > 

Jul 08 15:09:01 ...... FATAL ERROR, exiting
ADD REPLY
1
Entering edit mode

Okay, so something wrong with the fastq. What do the first 10 lines look like? If the first 10 lines look fine, maybe the fastq is garbled further down.

ADD REPLY
0
Entering edit mode

Here are the first 10 lines:

@COOPER:276:H2HTMBBXY:7:1101:10003:1209 1:N:0:NTTGTACT
NTGATGAGTGAGTGTCTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTACTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAACAA
+
#AAFFJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ<<<----
--
@COOPER:276:H2HTMBBXY:7:1101:10003:1349 1:N:0:NTTGTACT
NACATGAGTATTAGGCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCTTGTACTATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAAAA
+
#AAAFJJJFJJJJJJJJJFJJJJJJJJFJFJJJJJJJJJJJJJJJJJJFAJJJJJJJJJJFFJJJJJJJJJJFJJJJJJJFJJJJJJJJJJJJJ<-<----
--
ADD REPLY
1
Entering edit mode

Those look okay, but you can confirm by making a baby fastq with just those two sequences, see if that runs.

ADD REPLY
1
Entering edit mode

How did you make the subset? Do something like zcat your.fastq.gz | head -n 4000 > subset.fq and be sure that you only use multipliers of 4 as a fastq read consists of four lines.

ADD REPLY
0
Entering edit mode

OK, that was definitely the issue with the subset of 1000 that I generated. Trying again, I got the same error as I did with the original fastq read. This seems to indicate that the issue isn't with memory usage.

ADD REPLY
3
Entering edit mode

SBATCH --mem-per-cpu=8G

I would recommend against using that option. STAR needs at least 30G of RAM for human sized genomes. So allocate more RAM to the entire job using #SBATCH --mem=40g option.

ADD REPLY
0
Entering edit mode

Thanks for the recommendation, I will be sure to increase RAM to 40G for future job submissions.. However, I'm still getting the same error for this alignment, even after setting it to 40G of RAM.

ADD REPLY
1
Entering edit mode

Did you make these indexes with the version of STAR currently installed on your cluster? There was no error during that process? Have they been tested and are known to work?

ADD REPLY
0
Entering edit mode

Yes, I generated the indexes with the same version of STAR and there was no error during the process. However, this alignment is my first attempt to test them, so they are not known to work. Is there any standard test that I can do to make sure there's no problem with my indexes?

ADD REPLY
1
Entering edit mode

Can you post the command used to generate STAR index?

Compare the files your have in your STAR index with the following listing to make sure you have most of these files in your index and they are of similar size (use du -shc * to determine file sizes).

34K     chrLength.txt
66K     chrNameLength.txt
34K     chrName.txt
34K     chrStart.txt
49M     exonGeTrInfo.tab
20M     exonInfo.tab
2.1M    geneInfo.tab
3.9G    Genome
34K     genomeParameters.txt
354K    Log.out
30G     SA
1.8G    SAindex
12M     sjdbInfo.txt
13M     sjdbList.fromGTF.out.tab
11M     sjdbList.out.tab
16M     transcriptInfo.tab
ADD REPLY
0
Entering edit mode

Command used to generate STAR index:

STAR --runThreadN 16 --runMode genomeGenerate --genomeDir ./star_index --genomeFastaFiles ./GRCh38.p12.genome.fa --sjdbGTFfile ./gencode.v30.annotation.gtf

Output of du -shc *

26K     chrLength.txt
50K     chrNameLength.txt
26K     chrName.txt
26K     chrStart.txt
66M     exonGeTrInfo.tab
27M     exonInfo.tab
1.7M    geneInfo.tab
4.7G    Genome
26K     genomeParameters.txt
37G     SA
2.2G    SAindex
16M     sjdbInfo.txt
14M     sjdbList.fromGTF.out.tab
14M     sjdbList.out.tab
19M     transcriptInfo.tab
44G     total
ADD REPLY
1
Entering edit mode

Those files look to be of close enough in size but I don't see a Log.out file. Did you save it somewhere else? It should say something like this at the end.

Mar 04 17:05:04 ... writing SAindex to disk
Writing 8 bytes into ./SAindex ; empty space on disk = 1925055853363200 bytes ... done
Writing 120 bytes into ./SAindex ; empty space on disk = 1925055853363200 bytes ... done
Writing 1565873491 bytes into ./SAindex ; empty space on disk = 1925055853363200 bytes ... done
Mar 04 17:05:11 ..... finished successfully
DONE: Genome generation, EXITING

Are you using a pre-compiled version of STAR or did the admins of this cluster compile/install from source?

ADD REPLY
0
Entering edit mode

Yes, I saved it somewhere else, and it has a very similar ending to yours. I'm using a pre-compiled version of STAR (version 2.7.1a).

Jul 03 12:10:33 ... writing SAindex to disk
Writing 8 bytes into ./star_index/SAindex ; empty space on disk = 172171319050240 bytes ... done
Writing 120 bytes into ./star_index/SAindex ; empty space on disk = 172171319050240 bytes ... done
Writing 1565873491 bytes into ./star_index/SAindex ; empty space on disk = 172171319050240 bytes ... done
Jul 03 12:10:40 ..... Finished successfully
DONE: Genome generation, EXITING
ADD REPLY
0
Entering edit mode

Yes I agree @genomax because I faced same issue when I was trying to submit my batch script on server. In my case I used 31 GB for human genome.

ADD REPLY
1
Entering edit mode
5.4 years ago
GenoMax 147k

Looks like you have paired-end data. Is that correct? These files should be provided to STAR as --readFilesIn /path_to/R1_file.gz /path_to/R2_file.gz. Can you explicitly use a pair of R1/R2 files when you submit a job.

ADD COMMENT
0
Entering edit mode

Yes! This was the issue, both R1/R2 files need to be supplied at once. Thank you for all your help.

ADD REPLY

Login before adding your answer.

Traffic: 2358 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6