generating genome indexes with STAR
3
5
Entering edit mode
8.4 years ago
snp87 ▴ 80

Hi, I am trying to generate mouse geneome indexes with STAR to align my RNAseq data. I downloaded the mm10 genome in tar.gz format (Mus_musculus_UCSC_mm10.tar.gz). Do I have to gunzip this first before trying to index the genome? Thank you!

RNA-Seq STAR • 39k views
ADD COMMENT
2
Entering edit mode
8.4 years ago

Yes, you'll need to untar it first.

ADD COMMENT
0
Entering edit mode

Thanks so much! After untar-ing when I tried indexing, I don't think it worked.

The command used:

STAR --runThreadN 4 --runMode genomeGenerate --genomeDir Genome_data/star \
--genomeFastaFiles Genome_data/Mus_musculus_UCSC_mm10.tar.gz

Result: Jun 23 14:20:03 ..... Started STAR run Jun 23 14:20:03 ... Starting to generate Genome files Killed: 9

I don't think it was indexed. Do you know what the problem might be?

ADD REPLY
4
Entering edit mode

You are still using the tar archive for the fasta files. Use the fasta files extracted from the tar archive

ADD REPLY
0
Entering edit mode

STAR is known to require plenty of RAM (30+G) for operations. How much memory do you have access to?

ADD REPLY
2
Entering edit mode
16 months ago
DareDevil ★ 4.3k

You can try the following:

STAR --runThreadN 4 \
--runMode genomeGenerate \
--genomeDir /path/to/genomeDir \
--genomeFastaFiles /path/to/genome/ref.fasta \
--sjdbGTFfile /path/to/annotations.gtf \
--sjdbOverhang 149

--sjdbOverhang is ReadLength-1

ADD COMMENT
0
Entering edit mode
8.0 years ago
aleferna ▴ 10

I had the same problem, for me it was becuase I was using a database of all human viruses as opposed to a genome. This had too many contigs, I read in a google post to increase the genomeChrBinNbits 14, that did it.

ADD COMMENT

Login before adding your answer.

Traffic: 1748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6