Entering edit mode
4.2 years ago
bioinformatics.queries
▴
70
Hello everyone
I am trying to generate the index file for STAR alignment using hg19 genome. I used the following commad
STAR --runThreadN 30 --runMode genomeGenerate --genomeDir /data/shilpia2/STAR.index/ --genomeFastaFiles /data/shilpia2/STAR.index/GRCh37.primary_assembly.genome.fa --sjdbGTFfile /data/shilpia2/gff/gencode.v24.basic.annotation.gtf --sjdbOverhang 100 --limitGenomeGenerateRAM 30000000000 --outFileNamePrefix /data/shilpia2/STAR.index/hg19
However, the program stops after a while without giving any error and without generating the index file. Could anyone suggest me what could be the reason or is there any problem in my command.
Thanks
I would drop the
--limitGenomeGenerateRAM
and--outFileNamePrefix
flags You could reduce--runThreadN
to say, 8, (it might be a resource issue with your cluster). Also make sure that the--genomeDir
exists. Let me know how you get onHow much memory do you have? You need at least 30G+ RAM for the index generation.
Thank you so much for your response. I used 30GM RAM to run my program and run it for 3 days but it still did not generate the file. Do you think i should run for longer time.
Did you have 30 cores available for the job? Did you get anything in log/error log?
Alex has pre-made hg19/GRCh37 indexes available at this link, if you can't make them.
I do have 30 cores available. The log file generated does not show any error. The running of STAR terminates after reading of the gtf file. I tried to use the index file from the link you provided. But it shows some error in the genome file.
This is what it appears in the log file.
https://www.gencodegenes.org/human/release_24.html has a file named 'gencode.v24.basic.annotation.gtf'
There are all for hg38 not hg37/hg19.
The hg37/hg19 versions are here: https://www.gencodegenes.org/human/release_24lift37.html
From the link you provided should i download Comprehensive gene annotation file for GTF and Genome sequence, primary assembly (GRCh37) files ?
It's up to you and depends on the goals of your study. I primarily use annotations from ENSEMBL and am thus not familiar with the basic vs comprehensive gene annotations. I think you should probably be fine undertaking standard differential gene expression analysis with the basic set but some features could be missing.
I just have another question. Which is better for alignment. I know people have been recommending to use STAR, but what if I use Bowtie. I was just trying to compare both the tools and see how much is the difference. I was looking for your suggestion. I have to do simple differential gene analysis. So it is ok if I use Bowtie?
No, Bowtie is used for genomic alignments (i.e. DNA), for transcriptomic alignments (RNA) most would recommend a splice-aware aligner like STAR but you could also use TopHat2 (which uses bowtie under the hood).
Ok. Thank you so much for your response.