STAR index generation for bacterial genome
0
1
Entering edit mode
3.0 years ago

Hi,

I'm trying to analyze RNA-Seq data for a bacteria - Mycobacterium tuberculosis. I used the FASTA and GTF files from NCBI to create the index, and set the --genomeSAindexNbases at 8 based on this previous post. The bash script I used is: `

# load modules
module load gcc/6.2.0 star/2.7.0a

# launch star
STAR --runThreadN 8 \
--runMode genomeGenerate \
--genomeDir /home/xyz/scratch/sanraffaele/indices/star/ \
--genomeFastaFiles ~/reference_data/NC000962_3.fasta \
--sjdbGTFfile ~/reference_data/NC000962_3.gtf \
--genomeSAindexNbases 8

The index generation is taking ~15 seconds, and on reviewing the files in the folder it appears that the index has only 70 or so transcripts. Between the short time to generate the index (genome length is 4M bp) and the presence of so few transcripts, I know that something is wrong. Any suggestions about what I should differently?

STAR bacteria index • 2.0k views
ADD COMMENT
1
Entering edit mode

Since you don't need to worry about splicing there is no specific advantage to using STAR. You could use any aligner.

it appears that the index has only 70 or so transcripts

Not sure what you mean by that. It is not unusual to have the index finish quickly. You have a small genome. You can try doing an alignment and see what you get.

ADD REPLY
0
Entering edit mode

Thank you - I will try that.

ADD REPLY
0
Entering edit mode

Update: I realized that generating the index needs only the FASTA file. The GTF file is necessary only if one is interested in generating a read count matrix. For bacterial GTF files, Alex Dobin recommends changing column 3 to "exon" for all entries as discussed in this post

ADD REPLY

Login before adding your answer.

Traffic: 1983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6