I ran STAR in a shared memory environment and tried --genomeLoad LoadAndKeep LoadAndRemove and LoadAndExit hoping one-time reference load can be used by all the samples. However, each sample still load its own reference and memory accumulates in cache and eventual killed job due to insufficient RAM. Can anything share some idea on what is going here? Really appreciated!
By the way. what is the difference between LoadAndRemove and LoadAndExit?
LoadAndExit is convenient if you want to load the genome and then use it in separate STAR runs. It's generally the method I take, since I prefer to loop over samples and not need to keep track of which one is the first one (i.e., I call LoadAndExit first, then make a for loop over samples, and finally call Remove after the for loop).
Actually I was running two samples almost simultaneously. I thought using LoadAndKeep or LoadAndExit (by the way, what is the difference between these two? I thought both of them are loading the index and keep it in cache) allows the first pipeline load the index and keep it in cache and the second pipeline can use it without loading again. But my test says otherwise...
Hi Ryan, does it possible to check whether the genome is loaded in the memory or not, and how?
I wrap the STAR command in a function. if it is possible to check the genome in memory, I can choose which --genomeLoad option to use, and the genome can be removed after all functions finished.
Thanks so much. it works. after a few test, I found, if the script was interrupted for some reason. the loaded genome would still saved in memory. Even in this situation after several runs, I could clear the memory to fix it (with root).
Anyway, I could run the function in parallel now, thanks again.
It seems either you are loading the genome multiple times or a STAR bug. How are you running the multiple STAR runs? Which version of STAR?
LoadAndRemove will automatically remove the index from memory once all STAR jobs using it finishes. LoadAndExit will leave the index in memory until you run STAR with --genomeLoad Remove.
More or less the same, LoadAndExit does just that, and no mapping whatsoever. LoadAndKeep loads the genome, maps reads and then exits, but leaving the index in memory.
If you use LoadAndExit, STAR doesn't need to know, you will tell STAR when to remove the index after the loop finishes.
How do you access the loaded index in the looped call to the star aligner? It seems that STAR is not using my loaded genome correctly. I have tried several configurations of the following with and without the --genomeDir flag.
STAR --genomeLoad LoadAndExit --genomeDir $STARINDEX
for file in $(ls myFastqs/); do
pushd myFastqs
rm -r $file-processed
mkdir $file-processed
pushd $file-processed
STAR --runThreadN 5 \
--readFilesIn ../$file \
--outFilterMismatchNoverLmax 0.05 \
--alignIntronMax 20000 \
--genomeDir $STARINDEX \
--outSAMstrandField intronMotif \
--quantMode GeneCounts \
--sjdbGTFfile $STARGTF
popd
popd
done
STAR --genomeLoad Remove --genomeDir $STARINDEX
It seems that STAR is not using my loaded genome correctly.
Why do you think so? Are there error messages? Note that it may be worth opening a new question, if the issue has not been solved by the suggestions in this thread.
I think that i am not properly telling STAR to use the loaded index because when run it as shown (with --genomeDir $STARINDEX) a index file is loaded for every input.fastq in the loop and the system runs out of memory. However when i omit the (--genomeDir $STARINDEX) i get an error saying that the index was not found.
How do i properly input a pre-loaded index into each looped call to STAR?
Hello, if I follow your instructions I get this error:
Jun 22 20:41:50 ...... FATAL ERROR, exiting
./STAR_alignment_paired_2.sh: line 17: --genomeLoad: command not found
The 17th line of my code is inside the for loop which would iterate over the read pairs.
So maybe the --genomeLoad command inside the loop is unnecessary?
Actually I was running two samples almost simultaneously. I thought using LoadAndKeep or LoadAndExit (by the way, what is the difference between these two? I thought both of them are loading the index and keep it in cache) allows the first pipeline load the index and keep it in cache and the second pipeline can use it without loading again. But my test says otherwise...
Well, I retried it in your way and it worked. Thanks Ryan!
Hi Ryan, does it possible to check whether the genome is loaded in the memory or not, and how?
I wrap the
STAR
command in a function. if it is possible to check thegenome
in memory, I can choose which--genomeLoad
option to use, and the genome can be removed after all functions finished.You're advised to just use
LoadAndRemove
, which will leave a single copy in memory until all concurrent jobs are done.Thanks so much. it works. after a few test, I found, if the script was interrupted for some reason. the loaded genome would still saved in memory. Even in this situation after several runs, I could clear the memory to fix it (with
root
).Anyway, I could run the
function
in parallel now, thanks again.