Question

Pre made STAR Index?

2

Entering edit mode

8.5 years ago

atcggcta ▴ 150

Hello!

I'm sorry if this question comes of as naive or ignorant because I'm very new to Bioinformatics. I'm trying to do an alignment with STAR and was wondering if I could access a pre-made STAR index for the mm10 genome. I was told I could do this from UCSC but have had no luck finding it there.

So my question is, Are there pre-made STAR index files for the mm10 genome that I could download? And if so where and how?

Thanks in advance for any help and I'm sorry to ask such a trivial question! Let me know if there's anymore detail I can give!

STAR • 29k views

ADD COMMENT • link updated 8.5 years ago by dvanic ▴ 260 • written 8.5 years ago by atcggcta ▴ 150

score 11 · Accepted Answer · 2016-11-13

11

Entering edit mode

8.5 years ago

dvanic ▴ 260

I'd suggest generating your own index using the mm10 genome as per the instructions below, and using the latest gencode mouse genes. To keep things consistent (major problem in bioinformatics!!!) I'd download BOTH the genome and the annotation gtf from here http://www.gencodegenes.org/mouse_releases/current.html

You want the Comprehensive gene annotation - PRI gtf and the Genome sequence, primary assembly (GRCm38) - PRI fasta sequence (this is your genome).

ADD COMMENT • link 8.5 years ago by dvanic ▴ 260

0

Entering edit mode

Thank you so much for the response, this is very helpful!

ADD REPLY • link 8.5 years ago by atcggcta ▴ 150

1

Entering edit mode

I'd very strongly suggest you build your own index! To do this:

Download the two files I suggested

ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M11/gencode.vM11.primary_assembly.annotation.gtf.gz
ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M11/GRCm38.primary_assembly.genome.fa.gz

then run

gunzip gencode.vM11.primary_assembly.annotation.gtf.gz
gunzip GRCm38.primary_assembly.genome.fa.gz

wherever you saved those files then:

STAR --runThreadN 4 --runMode genomeGenerate --genomeDir WhereYouWantIndex --genomeFastaFiles GRCm38.primary_assembly.genome.fa --sjdbGTFfile gencode.vM11.primary_assembly.annotation.gtf --sjdbOverhang 100

This will use 4 cores to generate a genome and splice junction (which you want!!!) annotation for your genome. The 100 allows your reads to overhang each splice junction by maximum 100 bp. If your reads are longer (150 ?) then make that the value of this parameter. Then map against this.

NB If you plan to do differential expression, use featureCounts or HTSeq to counts to that gencode GTF.

ADD REPLY • link 8.5 years ago by dvanic ▴ 260

0

Entering edit mode

OK thank you very much! I will definitely try this out.

One more question. The CPU of the server I'm using has 40 cores, does this change how many I should use to build the index?

Thanks again!

ADD REPLY • link 8.5 years ago by atcggcta ▴ 150

1

Entering edit mode

It will just be faster with more cores but not influence the behavior of the index files., essentially, the files will be equal regardless of the number of cores used.

ADD REPLY • link 8.5 years ago by WouterDeCoster 48k

0

Entering edit mode

OK Awesome, Thanks again

ADD REPLY • link 8.5 years ago by atcggcta ▴ 150

0

Entering edit mode

Just for clarity the primary assembly GRCm38 is not the same genome as mm10 from UCSC correct? So per the encode data standards you would download mm10 as the genome (which is based on GRCm38) and then use the gencode comprehensive gtf for annotation?

ADD REPLY • link 6.4 years ago by pvd2107 • 0

score 3 · Accepted Answer · 2016-11-13

3

Entering edit mode

8.5 years ago

GenoMax 151k

You can generate indexes yourself easily enough. Follow the directions here: generating genome indexes with STAR . MM10 genome from UCSC is here.

@Alex has some pre-made indexes available at STAR Genomes site. There does not appear to be a UCSC version of Mouse but there is Gencode Mouse which you can use.

ADD COMMENT • link 8.5 years ago by GenoMax 151k

0

Entering edit mode

Thanks for the answer and advice!

I have another newbie question though, when I follow your Gencode Mouse link I find a bunch of links available. Would you be able to tell me which one I should use as the index when I'm doing the alignment?

Thanks again and sorry if this is a silly question!

ADD REPLY • link 8.5 years ago by atcggcta ▴ 150

1

Entering edit mode

STAR index consists of these files.

chrLength.txt  chrName.txt  chrStart.txt  Genome  genomeParameters.txt  SA  SAindex

I suggest that you get the entire set of files in that folder.

ADD REPLY • link 8.5 years ago by GenoMax 151k

0

Entering edit mode

OK awesome thank you very much!

ADD REPLY • link 8.5 years ago by atcggcta ▴ 150