I am trying to build reference genome using STAR. Genome is of wheat here which is around 17gb. However no matter how many parameters I change I get the following error:
EXITING because of FATAL ERROR: could not open genome file /home/ashishk/RAWFILES/genome/STAR_genome_index//genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsission
After searching on google for quite a while I think it is a RAM issue. I am working on a server where I am allocated 200GB RAM only. I tried to drastically change the parameters which decide RAM usage according to suggestions on various forums but still I get the same error again. This is the command I am using:
STAR --runMode genomeGenerate --runThreadN 15 --genomeDir /home/ashishk/RAWFILES/genome/STAR_genome_index/ --genomeSAsparseD 10 --genomeFastaFiles /home/ashishk/RAWFILES/genome/Triticum_aestivum.TGACv1.dna.toplevel.fa --sjdbGTFfile /home/ashishk/RAWFILES/gtf/Triticum_aestivum.TGACv1.36.gtf --sjdbOverhang 99 --genomeChrBinNbits 4 --genomeSAindexNbases 4
Now, even if it works with even little more changes I am afraid that I might end up with bad results since the parameters are so away from their default values. Please suggest some way to use STAR using the available resources or suggest a better tool which can handle large genomes.
Thank you
Does
/home/ashishk/RAWFILES/genome/STAR_genome_index/
exist?yes it exists and have the required permissions as well
Then, check if the file
/home/ashishk/RAWFILES/genome/STAR_genome_index//genomeParameters.txt
exists. Maybe the double slash causes problems to some downstream piece of code (shoudln't be a problem for shell).The folder exists but the tool is not writing any files in it. STAR starts indexing and then stops after 5 minutes without writing any files in /home/ashishk/RAWFILES/genome/STAR_genome_index . I've tried providing another path but same thing happens and in the fatal error it prints double slashes in both the cases.
Have you looked at memory usage (in
top
) to see if the 200G being allocated is getting exhausted? Have you generated genome indexes with this install of STAR before? If not, test with a small fasta to ensure that all is working well.Yes you're right. It stays around 98% and then eventually reaches 100%.
Assuming you are able to successfully generate indexes from other smaller fasta files you may be simply running out of memory.
If you have no additional options to try with STAR, I am going to suggest that you give
bbmap.sh
from BBMap suite a try. You can generate indexes by simply doingbbmap.sh -Xmx200g ref=your_fasta.fa
. I find BBMap to be an excellent splice-aware aligner that is plenty fast and efficient.I was able to generate indexes for a bacterial genome successfully. Then I tried an earlier version STAR which clearly stated in the error that you don't have enough RAM. Now I am using BBMap and I am finding it much better. Thanks.
Have you tried changing
/home/ashishk/RAWFILES/genome/STAR_genome_index/
to/home/ashishk/RAWFILES/genome/STAR_genome_index
, without the final slash? Maybe it won't change, but I would try.Yes, I tried that but received the same error again.
Then I would follow genomax's suggestion!
Check my earlier post on memory optimization in STAR A: Cannot Generate Genome from RNA Transcript in STAR