Problem building reference genome using STAR
0
0
Entering edit mode
7.4 years ago
ashish ▴ 680

I am trying to build reference genome using STAR. Genome is of wheat here which is around 17gb. However no matter how many parameters I change I get the following error:

EXITING because of FATAL ERROR: could not open genome file /home/ashishk/RAWFILES/genome/STAR_genome_index//genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsission

After searching on google for quite a while I think it is a RAM issue. I am working on a server where I am allocated 200GB RAM only. I tried to drastically change the parameters which decide RAM usage according to suggestions on various forums but still I get the same error again. This is the command I am using:

STAR --runMode genomeGenerate --runThreadN 15 --genomeDir /home/ashishk/RAWFILES/genome/STAR_genome_index/ --genomeSAsparseD 10 --genomeFastaFiles /home/ashishk/RAWFILES/genome/Triticum_aestivum.TGACv1.dna.toplevel.fa --sjdbGTFfile /home/ashishk/RAWFILES/gtf/Triticum_aestivum.TGACv1.36.gtf --sjdbOverhang 99 --genomeChrBinNbits 4 --genomeSAindexNbases 4

Now, even if it works with even little more changes I am afraid that I might end up with bad results since the parameters are so away from their default values. Please suggest some way to use STAR using the available resources or suggest a better tool which can handle large genomes.

Thank you

STAR transcriptome • 4.9k views
ADD COMMENT
1
Entering edit mode

Does /home/ashishk/RAWFILES/genome/STAR_genome_index/ exist?

ADD REPLY
0
Entering edit mode

yes it exists and have the required permissions as well

ADD REPLY
0
Entering edit mode

Then, check if the file /home/ashishk/RAWFILES/genome/STAR_genome_index//genomeParameters.txt exists. Maybe the double slash causes problems to some downstream piece of code (shoudln't be a problem for shell).

ADD REPLY
0
Entering edit mode

The folder exists but the tool is not writing any files in it. STAR starts indexing and then stops after 5 minutes without writing any files in /home/ashishk/RAWFILES/genome/STAR_genome_index . I've tried providing another path but same thing happens and in the fatal error it prints double slashes in both the cases.

ADD REPLY
1
Entering edit mode

Have you looked at memory usage (in top) to see if the 200G being allocated is getting exhausted? Have you generated genome indexes with this install of STAR before? If not, test with a small fasta to ensure that all is working well.

ADD REPLY
0
Entering edit mode

Yes you're right. It stays around 98% and then eventually reaches 100%.

ADD REPLY
1
Entering edit mode

Assuming you are able to successfully generate indexes from other smaller fasta files you may be simply running out of memory.

If you have no additional options to try with STAR, I am going to suggest that you give bbmap.sh from BBMap suite a try. You can generate indexes by simply doing bbmap.sh -Xmx200g ref=your_fasta.fa. I find BBMap to be an excellent splice-aware aligner that is plenty fast and efficient.

ADD REPLY
1
Entering edit mode

I was able to generate indexes for a bacterial genome successfully. Then I tried an earlier version STAR which clearly stated in the error that you don't have enough RAM. Now I am using BBMap and I am finding it much better. Thanks.

ADD REPLY
0
Entering edit mode

Have you tried changing /home/ashishk/RAWFILES/genome/STAR_genome_index/ to /home/ashishk/RAWFILES/genome/STAR_genome_index, without the final slash? Maybe it won't change, but I would try.

ADD REPLY
0
Entering edit mode

Yes, I tried that but received the same error again.

ADD REPLY
0
Entering edit mode

Then I would follow genomax's suggestion!

ADD REPLY
1
Entering edit mode

Check my earlier post on memory optimization in STAR A: Cannot Generate Genome from RNA Transcript in STAR

ADD REPLY

Login before adding your answer.

Traffic: 2030 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6