STAR index file for GRCH37
1
0
Entering edit mode
4.2 years ago

Hello everyone

I am trying to generate the index file for STAR alignment using hg19 genome. I used the following commad

STAR  --runThreadN 30    --runMode genomeGenerate  --genomeDir /data/shilpia2/STAR.index/ --genomeFastaFiles /data/shilpia2/STAR.index/GRCh37.primary_assembly.genome.fa --sjdbGTFfile /data/shilpia2/gff/gencode.v24.basic.annotation.gtf  --sjdbOverhang 100 --limitGenomeGenerateRAM 30000000000  --outFileNamePrefix /data/shilpia2/STAR.index/hg19

However, the program stops after a while without giving any error and without generating the index file. Could anyone suggest me what could be the reason or is there any problem in my command.

Thanks

software error STAR • 3.0k views
ADD COMMENT
0
Entering edit mode

I would drop the --limitGenomeGenerateRAM and --outFileNamePrefix flags You could reduce --runThreadN to say, 8, (it might be a resource issue with your cluster). Also make sure that the --genomeDir exists. Let me know how you get on

ADD REPLY
0
Entering edit mode

How much memory do you have? You need at least 30G+ RAM for the index generation.

ADD REPLY
0
Entering edit mode

Thank you so much for your response. I used 30GM RAM to run my program and run it for 3 days but it still did not generate the file. Do you think i should run for longer time.

ADD REPLY
0
Entering edit mode

Did you have 30 cores available for the job? Did you get anything in log/error log?

Alex has pre-made hg19/GRCh37 indexes available at this link, if you can't make them.

ADD REPLY
0
Entering edit mode

I do have 30 cores available. The log file generated does not show any error. The running of STAR terminates after reading of the gtf file. I tried to use the index file from the link you provided. But it shows some error in the genome file.

ADD REPLY
0
Entering edit mode

This is what it appears in the log file.

 ..... processing annotations GTF
!!!!! WARNING: while processing sjdbGTFfile=/data/shilpia2/gff/gencode.v24.basic.annotation.gtf, line:
chr3    HAVANA  exon    198024658   198024788   .   +   .   gene_id "ENSG00000185621.11"; transcript_id "ENST00000482695.5"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "LMLN"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "LMLN-002"; exon_number 15; exon_id "ENSE00003689636.1"; level 2; protein_id "ENSP00000418324.1"; tag "basic"; transcript_support_level "1"; tag "appris_alternative_2"; havana_gene "OTTHUMG00000155375.2"; havana_transcript "OTTHUMT00000339702.1";
 exon end = 198024788 is larger than the chromosome chr3 length = 198022430 , will skip this exon
ADD REPLY
1
Entering edit mode

https://www.gencodegenes.org/human/release_24.html has a file named 'gencode.v24.basic.annotation.gtf'

There are all for hg38 not hg37/hg19.

The hg37/hg19 versions are here: https://www.gencodegenes.org/human/release_24lift37.html

ADD REPLY
0
Entering edit mode

From the link you provided should i download Comprehensive gene annotation file for GTF and Genome sequence, primary assembly (GRCh37) files ?

ADD REPLY
0
Entering edit mode

It's up to you and depends on the goals of your study. I primarily use annotations from ENSEMBL and am thus not familiar with the basic vs comprehensive gene annotations. I think you should probably be fine undertaking standard differential gene expression analysis with the basic set but some features could be missing.

ADD REPLY
0
Entering edit mode

I just have another question. Which is better for alignment. I know people have been recommending to use STAR, but what if I use Bowtie. I was just trying to compare both the tools and see how much is the difference. I was looking for your suggestion. I have to do simple differential gene analysis. So it is ok if I use Bowtie?

ADD REPLY
0
Entering edit mode

No, Bowtie is used for genomic alignments (i.e. DNA), for transcriptomic alignments (RNA) most would recommend a splice-aware aligner like STAR but you could also use TopHat2 (which uses bowtie under the hood).

ADD REPLY
0
Entering edit mode

Ok. Thank you so much for your response.

ADD REPLY
0
Entering edit mode
4.2 years ago
GenoMax 147k

Are you mixing/matching sequences/annotations by any chance? They are all for the same build?

ADD COMMENT
0
Entering edit mode

Hi

I did mix the annotation which caused the problem. I got the index file generated using the right GTF file. Thank you so much for your response.

ADD REPLY

Login before adding your answer.

Traffic: 2775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6