I am trying to generate a genome index of the pepper (capsicum annuum) genome using STAR. Its a really large genome of 3.5 GB and the genome FASTA contains 12 pseudomolecule assemblies and over 30.000 scaffolds. I call STAR with the following command:
$STAR --runMode genomeGenerate –-genomeDir /data_raid1_ssd/databases/genomes/pepper/star --genomeFastaFiles /data_raid1_ssd/databases/genomes/pepper/Annuum.v1.6.Total.fa --runThreadN 8 --genomeChrBinNbits 16 --sjdbGTFfile /data_raid1_ssd/databases/genomes/pepper/Annuum.v.2.0.chromosome.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 99
the processes exits with the following message:
genomeGenerate.cpp:209:genomeGenerate: exiting because of *OUTPUT FILE* error: could not create output file ./GenomeDir//chrName.txt
Solution: check that the path exists and you have write permission for this file
I am using a linux machine with 64 GB RAM which should somehow work (10 x genome size = 35 GB) and using basically the same command line to generate the Arabidopsis genome index worked fine. I guess it is the huge numer of temp files STAR generates because of the huge number of scaffolds in the genome fasta file that causes the problem, but I might be wrong. I already increased my allowed number of open files to 16384 using the
ulimit -n 16384
command but it didn't help. Is there anything I can do to tweak STAR to better deal with this large number of scaffolds or is there any other solution to the problem.
Thanks R
Huh? I'm reading the error as the output folder is not writable, hence permission error. Why are we talking about memory? Maybe you're missing
--outFileNamePrefix
?According to this post try to tweakle the threads number too.
You can also try to increase your number of open files over 16384.
I've increased
and now used all 16 available threads. Didn't help.
I used
top
to check for %MEM and it says only 10.8%.Your problem is not related to memory. STAR can't reach a file while running.
--outFileNamePrefix
)Try to not thread your command line to see if the issue still stand.
Which step of the process failed ? Also could we have the complete log file ?
I am not quite sure how to use the --outFileNamePrefix in context of my command line. But here is a link to Log.out file that was generated.
Log.out
Not sure which step, but happens pretty quick (ca. 2 min after starting the job).
Do you have all rigths on
/data_raid1_ssd/databases/genomes/pepper/star
?Do the path
/data_raid1_ssd/databases/genomes/pepper/star
exists ?Maybe rename it
/data_raid1_ssd/databases/genomes/pepper/star_index/
The option
outFileNamePrefix
is used to output the files in an other directory than the current one. Try to set it to a directory owned by yourselfTry to remove some options to see if it affect the result, just keep it simple :
Somehow your
--genomeDir
isn't registered. If you look closely in yourLog.out
, as suggested by Bastien, you'll notice thegenomeDir
is set back to the default value./GenomeDir/
.A long while back, I had something similar when using
nohup
. While I haven't seen any such permission error in recent years, I continue to use--outFileNamePrefix
for historical reason.Below is an example.
outFileNamePrefix
simply is whatever I have ingenomeDir
+/
Hope it helps. Good luck.
Btw, I see you're from CSHL. I hope you're running away from mosquitoes this time of year! :)