STAR index fail with gtf, works with gff3
Entering edit mode
6.8 years ago

I try to create an index with STAR version STAR-2.5.2b, I got an error at the "processing annotations GTF" step with a GTF file, so I try with the associate GFF3 and it's working, question is, why ? I know, I could use that GFF3 file but I don't want to introduce an other file in my RNA-seq workflow.

Here is the stuff you need :

Reference genome :


GFF3 :

STAR : STAR-2.5.2b

I subsampled the reference genome to only keep annotate chromosomes in GTF file, this way I dodge reads that could possibly match outside the annotation ( Looking for a thorough annotation for non-primary assembly units in GRCm38 ). I named it GRCm38.p5.genome_subsampled.fa

I use a cluster to do my job, I set for both strategy (GTF and GFF3), h_vmem (specify the amount of maximum memory required) at 64G and mem (specify the amount of maximum memory required) at 16G, which is enought. I also use 8 threads to process.

Here are my commands :

GTF strategy

$star --runThreadN 8 --runMode genomeGenerate --genomeDir /home/hbastien/work/MGRS/star_index --genomeFastaFiles /home/hbastien/save/MGRS/GRCm38.p5.genome_subsampled.fa --sjdbGTFfile /home/hbastien/save/MGRS/gencode.vM16.chr_patch_hapl_scaff.annotation.gtf --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 75;

GFF3 strategy

$star --runThreadN 8 --runMode genomeGenerate --genomeDir /home/hbastien/work/MGRS/star_index --genomeFastaFiles /home/hbastien/save/MGRS/GRCm38.p5.genome_subsampled.fa --sjdbGTFfile /home/hbastien/save/MGRS/gencode.vM16.chr_patch_hapl_scaff.annotation.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 75;

In around 30 minutes with GTF

I got in my error output file :

terminate called after throwing an instance of 'std::out_of_range'

what(): vector::_M_range_check

/var/spool/sge/node002/job_scripts/7117238: line 17: 57352 Abandon

$star --run ThreadN 8 --runMode genomeGenerate --genomeDir /home/hbastien/work/MGRS/star_index --genomeFastaFiles /home/hbastien/save/MGRS/GRCm38.p5.genome_subsampled.fa --sjdbGTFfile /home/hbastien/save/MGRS/gencode.vM16.chr_patch_hapl_scaff.annotation.gtf --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 75

Your job has been killed.

This may happen if one of the followings hold :

  • you exceeded one of the queue/job limits (run time, memory, etc)

  • you (or admin) killed the job using qdel

  • something bad happened.

Now, just in case something bad happened, here are the debug information about your job : total 0

And in my standard output file :

Feb 21 13:45:20 ..... started STAR run

Feb 21 13:45:20 ... starting to generate Genome files

Feb 21 13:46:34 ... starting to sort Suffix Array. This may take a long time...

Feb 21 13:46:51 ... sorting Suffix Array chunks and saving them to disk...

Feb 21 14:09:59 ... loading chunks from disk, packing SA...

Feb 21 14:11:18 ... finished generating suffix array

Feb 21 14:11:18 ... generating Suffix Array index

Feb 21 14:14:43 ... completed Suffix Array index

Feb 21 14:14:43 ..... processing annotations GTF

Whereas with the GFF file, in around 45 minutes

My error output file is empty.

And in my standard output file :

Feb 21 15:34:20 ..... started STAR run

Feb 21 15:34:20 ... starting to generate Genome files

Feb 21 15:35:39 ... starting to sort Suffix Array. This may take a long time...

Feb 21 15:36:04 ... sorting Suffix Array chunks and saving them to disk...

Feb 21 16:05:14 ... loading chunks from disk, packing SA...

Feb 21 16:06:33 ... finished generating suffix array

Feb 21 16:06:33 ... generating Suffix Array index

Feb 21 16:11:03 ... completed Suffix Array index

Feb 21 16:11:03 ..... processing annotations GTF

Feb 21 16:11:27 ..... inserting junctions into the genome indices

Feb 21 16:14:54 ... writing Genome to disk ...

Feb 21 16:14:56 ... writing Suffix Array to disk ...

Feb 21 16:15:13 ... writing SAindex to disk

Feb 21 16:15:14 ..... finished successfully

Epilog : job finished at mer. févr. 21 16:15:14 CET 2018

I tried to increase memory following the error 'std::out_of_range', but that didn't do the trick...

The two Log.out files are a bit huge to be display here, but if you need it I can share them.

If you have any hints !

Thanks a lot

GFF3 GTF STAR • 7.1k views
Entering edit mode
6.8 years ago
h.mon 35k

I believe you don't need --sjdbGTFtagExonParentTranscript Parent to index GTF files - maybe this is the source of error?

Entering edit mode

Well played, works better now. I didn't think this option could interfer... If you want to add this as an answer, I'll mark it as accepted. Thank you


Login before adding your answer.

Traffic: 1966 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6