Entering edit mode
6.4 years ago
salamandra
▴
550
I generated genome index in STAR without problem, but when aligning reads to the genome got this error:
EXITING because of FATAL ERROR: could not open genome file /Volumes/PereiraLab/Tania/genomes/ensembl/Homo_sapiens_GRCh38_dna_primary_assembly/with_indices_Homo_sapiens_GRCh38_dna_primary_assembly/genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions
I checked in index genome directory and the file genomeParameters.txt does not exist there or in the whole computer.
Then, I saw this post. It seems there was not enough RAM for generating the index genome files, although no error was record. But how can I increase RAM when generating the index?
I'm new bioinformatics so have no idea how to do this. The commands for indexing the genome were:
GENOME=/Volumes/PereiraLab/Tania/genomes/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly/Homo_sapiens.GRCh38.dna.primary_assembly.fa
ANOTATION=/Volumes/PereiraLab/Tania/annotations/ensembl/Homo_sapiens.GRCh38.92/Homo_sapiens.GRCh38.92.gtf
GENOME_Index=/Volumes/PereiraLab/Tania/genomes/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly/with_indices_Homo_sapiens.GRCh38.dna.primary_assembly/
STAR --runThreadN 4 --runMode genomeGenerate --genomeFastaFiles $GENOME --sjdbGTFfile $ANOTATION --sjdbOverhang 99 --genomeDir $GENOME_Index
and for aligning:
READS=/Volumes/PereiraLab/Tania/Reprogram_HSC/Bioinformatics/RNA_seq_analysis_Newmethods/Results/BJs/WithoutAdaptersFastq/1A_ATCACG_withoutadapters.fastq
ALIGNMENTS=/Volumes/PereiraLab/Tania/Reprogram_HSC/Bioinformatics/RNA_seq_analysis_Newmethods/Results/BJs/STAR_alignments/
INDEX=/Volumes/PereiraLab/Tania/genomes/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly/with_indices_Homo_sapiens.GRCh38.dna.primary_assembly
STAR --runThreadN 12 --genomeDir $INDEX --readFilesIn $READS --outFileNamePrefix $ALIGNMENTS --outSAMtype BAM SortedByCoordinate --quantMode GeneCounts
The genome index folder has these files only:
$ cd /Volumes/PereiraLab/Tania/genomes/ensembl/Homo_sapiens.GRCh38.dna.primary_assembly/with_indices_Homo_sapiens.RCh38.dna.primary_assembly
$ ls
chrLength.txt exonGeTrInfo.tab sjdbList.fromGTF.out.tab
chrName.txt exonInfo.tab sjdbList.out.tab
chrNameLength.txt geneInfo.tab transcriptInfo.tab
chrStart.txt sjdbInfo.txt
How much RAM do you have? STAR requires 30G+ of free RAM to generate indexes and run alignments for human genome. If you don't have that much RAM available, you will need to find alternate hardware.
Alex Dobin makes pre-made STAR indexes available, if you want to save time. But above limit still applies.
My pc has 16GB of RAM total. If use the pre-made indexes you suggest do I still need another computer to do the alignment? And to get those indexes I have to download each of the files in the link you provided into the same folder on my pc or just one of them?
Yes. Memory footprint remains relatively large. You might need to try other alternatives, like hisat.
Yes you have to get all the files for the build you choose and they will have to be placed in the same folder on your local machine.
Thanks. just to check: if I have a computer with 4 cores and each has 4GB (16 GB in total), the number I should put in front of the option '--runThreadN' is 4 and not 16...
Correct. 16 G is shared by all cores. There is no assignment of 4 G per core.