Hi,
I am running my genome indexing STAR code in a company's cluster:
STAR --runThreadN 1 --runMode genomeGenerate --genomeDir /home/id/
--genomeFastaFiles /home/id/GRCm38.primary_assembly.genome.fa --sjdbGTFfile /home/id/gencode.vM24.primary_assembly.annotation.gtf --sjdbOverhang 100
But, I keep running into the same error saying,
Apr 01 08:44:03 ..... started STAR run Apr 01 08:44:03 ... starting to generate Genome files Apr 01 08:45:09 ... starting to sort Suffix Array. This may take a long time... Apr 01 08:45:25 ... sorting Suffix Array chunks and saving them to disk... Apr 01 09:10:06 ... loading chunks from disk, packing SA...
EXITING because of FATAL problem while generating the suffix array The number of indices read from chunks = 2269570266 is not equal to expected nSA=5305567000 SOLUTION: try to re-run suffix array generation, if it still does not work, report this problem to the author
Apr 01 09:10:49 ...... FATAL ERROR, exiting
After I googled about the problem, I thought the low number of thread could be a problem so I already increased to 4. Also, I assigned 36GB to the cluster to run STAR. But still I got the same error message.. Could you help me know how to resolve this issue? Thank you so much.
The command looks ok. Also 36 GB is enough. But try increase the memory, just for sake of it. Was there some problem in downloading the genome? Did you unzip it after downloading?
I did unzip the two files. I confirmed that I am able to see gtf and fa files in my /home/id/. Now I am running the code with the number of thread 8... How many of thread do you recommend me to try? (Just found out I got the same error message with 8 of thread)
The source of the two files:
Genome sequence, primary assembly (GRCm38) ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M24/GRCm38.primary_assembly.genome.fa.gz
Comprehensive gene annotation ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M24/gencode.vM24.primary_assembly.annotation.gtf.gz
I don't think number of threads are issue, it will just make the program run faster. And I mentioned about the memory and not thread.
It could be that you don't have enough free space. Someone got the same error, probably due to less than 100GB space. Look here: https://github.com/alexdobin/STAR/issues/534
Thanks. I already referred to the page a few hours ago. I am still running my code with different size of memory. Fingers crossed..
In my experience, I have successfully generated index on my desktop machine which had 40 GB RAM. Are you running it on your cloud? I hope you have enough space in the drive. That's the only suggest Alex (the Creator) himself mentioned which did the trick.