Question

STAR Killing Run When Trying To Build An Annotated Reference Genome

0

Entering edit mode

8.6 years ago

pjmaguire3 ▴ 80

So I am running into a weird issue. I am trying to build an annotated reference genome with STAR, except I can never get passed the "sorting Suffix Array chunks and saving them to disk" portion before it displays "KILLED" and exits out of the command. At first I thought I the issue was that I was doing other things in parallel on the machine, and therefore eating up the RAM, so I proceeded to close everything on Windows except my virtual machine and run it again. But again the same vague message of "KILLED" displayed after ~1 hour of run time. Any idea what is going on here?

Machine Specs:

Intel i5 4690k
32 GB DDR3 RAM
Windows 7 Ultimate, running Umbuntu in a VirtualBox

Code:

STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /home/peter/Documents/Files/Reference --genomeFastaFiles /home/peter/Documents/Files/Rattus_norvegicus/NCBI/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa --sjdbGTFfile /home/peter/Documents/Files/Rattus_norvegicus/NCBI/Rnor_6.0/Annotation/Archives/archive-current/Genes/genes.gtf --sjdbOverhang 100

Files:

Screenshot:

Screenshot of the desktop

RNA-Seq STAR genome software error • 9.2k views

ADD COMMENT • link 8.5 years ago by pjmaguire3 ▴ 80

1

Entering edit mode

8.6 years ago

tiago211287 ★ 1.5k

Maybe it is a Ram memory issue. Star is way faster then tophat, but the problem is that it uses a Lot of Ram, 64GB or more. You should be using it on a linux server, not in a windows desktop. Maybe you should try using tophat for alignment.

ADD COMMENT • link 8.6 years ago by tiago211287 ★ 1.5k

0

Entering edit mode

Thanks! I will give Tophat a try and see if that works better for me. Do you know if I can use Tophat to build an annotated reference genome and then map reads to it using STAR?

Also is a Linux server that much more efficient than using a VM? I thought performance was about equal, if you eliminate resources that need to be dedicated to the host machine.

ADD REPLY • link 8.6 years ago by pjmaguire3 ▴ 80

1

Entering edit mode

You cant use the tophat index within STAR. In addition, even if you could download a pre built STAR index you probably would not be able to perform an alignment, since STAR uses too much RAM for this. A server machine running linux would crash anyway if it does not has enough RAM.

ADD REPLY • link 8.6 years ago by tiago211287 ★ 1.5k

0

Entering edit mode

I thought that STAR could handle alignments with 32 GB of RAM? But that is good to know that they are not compatible, since I was able to get pre-build annotated genome for Bowtie2, Bowtie, and BWA. Do you have a recommendation of an alternative mapping system that could use one of those annotated genomes? I am fine using something other than STAR if accuracy is preserved.

ADD REPLY • link 8.6 years ago by pjmaguire3 ▴ 80

2

Entering edit mode

I aways recomment STAR, but if a machine with more RAM is not an option, I guess Tophat it is fine. Just take your annotation and reference and build a index. Just make sure to have bowtie2 in the path. I would not recomend any splice aware aligner besides these.

ADD REPLY • link 8.6 years ago by tiago211287 ★ 1.5k

1

Entering edit mode

8.6 years ago

jotan ★ 1.3k

You could try increasing the --genomeSAsparseD parameter. (STAR manual)

I had the same problem but managed to eventually build an index by setting this to 2.

ADD COMMENT • link 8.6 years ago by jotan ★ 1.3k

1

Entering edit mode

This Star manual is outdated. I recommend the most recent version since many paramets are different.recent manual

In addition, as the manual says :

--genomeSAsparseD default: 1 int>0: suffux array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction

This indeed can solve your problem. But I think that if mapping will delay the nearly the same amount of time that Tophat, it makes little sense to use STAR, except if you want some specific feature besides the alignment. The main advantage of using STAR is its amazingly speed mapping with no cost of accuracy.

ADD REPLY • link 8.6 years ago by tiago211287 ★ 1.5k

1

Entering edit mode

Sorry, just a random google hit.

That parameter is just for building the index.

I haven't had any memory issues with mapping data so far on my 32 GB VM. Although, I do have to be careful not to run anything else when mapping. Running under these conditions is still much, much faster than TopHat.

ADD REPLY • link 8.6 years ago by jotan ★ 1.3k

0

Entering edit mode

Good to know. Thanks!

ADD REPLY • link 8.6 years ago by tiago211287 ★ 1.5k

0

Entering edit mode

it makes little sense to use STAR, except if you want some specific feature besides the alignment. The main advantage of using STAR is its amazingly speed mapping with no cost of accuracy.

There are a lot of advantages to STAR. I personally have seen much more accurate mapping with STAR compared to TopHat when looking at unannotated regions.

ADD REPLY • link 8.6 years ago by igor 13k

0

Entering edit mode

I have just started a run with "--genomeSAsparseD 2" and will post back with the results. Hopefully this solves my issue. Thanks!

ADD REPLY • link 8.6 years ago by pjmaguire3 ▴ 80

1

Entering edit mode

Hello pjmaguire,

Could you finally fix your problem? I'm having same problem as you, and not tried yet "--genomeSAsparseD" options.

Many thanks

Ramon

ADD REPLY • link 8.5 years ago by rgescudero ▴ 30

0

Entering edit mode

Hello Ramon,

Yes, I ultimately got it working and just submitted an answer on it for future people. Check out the submission for more details, but the general problem was with the pre-compiled STAR files - which apparently have some issues on some flavors of Linux. What you need to do is build them yourself from the source, and once you have done that everything should work nicely. I have no idea why the STAR development team decided not to mention this problem in the manual, or why they felt such a useless error message would be beneficial, but such is life. Hope that helps!

ADD REPLY • link 8.5 years ago by pjmaguire3 ▴ 80

1

Entering edit mode

Ok, many thanks. I hope this helps.

Ramon

ADD REPLY • link 8.5 years ago by rgescudero ▴ 30

score 2 · Accepted Answer · 2016-06-02

2

Entering edit mode

8.5 years ago

pjmaguire3 ▴ 80

In case anyone else runs into this issue in the future, the problem ended up being with STAR's pre-compiled build. Not sure exactly what is wrong with it, but apparently there can be issues with in on several different flavors of Linux (I am on Ubuntu). The solution was to go into source, build it with the make command, and then add the compiled system to your PATH (PATH =$PATH:$PWD for those who don't know how to do that, while being in the same directory you ran the make command). Once that was done everything ran smoothly and without any problems. Best of luck!

ADD COMMENT • link 8.5 years ago by pjmaguire3 ▴ 80

0

Entering edit mode

Did you try both "Linux_x86_64" and "Linux_x86_64_static"? The pre-compiled version contains both.

ADD REPLY • link 8.5 years ago by igor 13k

0

Entering edit mode

pjmaguire3 Hello, pjmaguire3, I am having the same problem. I cannot generate the index suffix because STAR stops at.. sorting Suffix Array chunks and saving them to disk... without any error. I am running STAR using cygwin from windows and I have 64Gb RAM so I think you comment could be the solution and not RAM limitation. I am not an expert in informatics so I don't understand well how I have to compile STAR executable. What I did is set the working directory in cd STAR/source and runing STAR from here. Also I set the path to STAR executable in PATH enviroment variable in windows setting system. That is what you did? Could be that the problem?

Im very stuck in this step for several days and I dont know what to do. Any help is welcoming I have a Intel Xeon CPU 3.5Ghz Number of Cores 4, Number of logical Procss 8 The mouse genome and .gtf files I downloaded them from iGenome website and I am using the WholeGenome.fa file from Ensembl. Is this genome too big? Should I generate my index chromosome per chromosome?

./STAR --runMode genomeGenerate --genomeDir /cygdrive/c/Ana_Gómez_Secuenciación/CM1_FACS/20160818_Carpeta_de_trabajo_H3YJLBGXY/index --genomeFastaFiles /cygdrive/c/Ana_Gómez_Secuenciación/Genome/reference/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa --runThreadN 6 --sjdbGTFfile /cygdrive/c/Ana_Gómez_Secuenciación/Genome/GTF_files/referenceGTF/genes.gtf --sjdbOverhang 75 --genomeSAsparseD parameter 1

ADD REPLY • link 8.1 years ago by anagd00 • 0