When I try to run RSEM calculate-expression, it fails with the following error:
Cannot open /sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat.transcripts.fa.grp! It may not exist.
I checked in the rsem reference folder and sure enough there is no rat.transcripts.fa.grp. There is a rat.grp however. The complete list of the contents of the reference folder is:
Genome chrName.txt exonInfo.tab rat.grp rat.ti sjdbList.fromGTF.out.tab
SA chrNameLength.txt geneInfo.tab rat.idx.fa rat.transcripts.fa sjdbList.out.tab
SAindex chrStart.txt genomeParameters.txt rat.n2g.idx.fa ratLog.out transcriptInfo.tab
chrLength.txt exonGeTrInfo.tab rat.chrlist rat.seq sjdbInfo.txt
I double checked the .out file from when I created the genome, and saw no sign of anything going wrong:
rsem-extract-reference-transcripts /sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat 0 /sc/orga/projects/MetaDope/genomes/rat/download/Rattus_norvegicus.Rnor_6.0.95.gtf None 0 /sc/orga/projects/MetaDope/genomes/rat/download/Rattus_norvegicus.Rnor_6.0.dna_sm.toplevel.fa
Parsed 200000 lines
Parsed 400000 lines
Parsed 600000 lines
Parsing gtf File is done!
/sc/orga/projects/MetaDope/genomes/rat/download/Rattus_norvegicus.Rnor_6.0.dna_sm.toplevel.fa is processed!
41078 transcripts are extracted.
Extracting sequences is done!
Group File is generated!
Transcript Information File is generated!
Chromosome List File is generated!
Extracted Sequences File is generated!
rsem-preref /sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat.transcripts.fa 1 /sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat
Refs.makeRefs finished!
Refs.saveRefs finished!
/sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat.idx.fa is generated!
/sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat.n2g.idx.fa is generated!
/hpc/packages/minerva-common/star/2.7.0a/src/STAR-2.7.0a/bin/STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /sc/orga/projects/MetaDope/genomes/rat/rsemRef --genomeFastaFiles /sc/orga/projects/MetaDope/genomes/rat/download/Rattus_norvegicus.Rnor_6.0.dna_sm.toplevel.fa --sjdbGTFfile /sc/orga/projects/MetaDope/genomes/rat/download/Rattus_norvegicus.Rnor_6.0.95.gtf --sjdbOverhang 100 --outFileNamePrefix /sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat
Feb 13 05:34:28 ..... started STAR run
Feb 13 05:34:28 ... starting to generate Genome files
Feb 13 05:36:23 ... starting to sort Suffix Array. This may take a long time...
Feb 13 05:37:15 ... sorting Suffix Array chunks and saving them to disk...
Feb 13 06:08:42 ... loading chunks from disk, packing SA...
Feb 13 06:10:11 ... finished generating suffix array
Feb 13 06:10:11 ... generating Suffix Array index
Feb 13 06:14:51 ... completed Suffix Array index
Feb 13 06:14:51 ..... processing annotations GTF
Feb 13 06:15:00 ..... inserting junctions into the genome indices
Feb 13 06:19:12 ... writing Genome to disk ...
Feb 13 06:19:15 ... writing Suffix Array to disk ...
Feb 13 06:19:37 ... writing SAindex to disk
Feb 13 06:19:39 ..... finished successfully
In another thread, I saw that lack of transcript annotations in the GTF file can cause problems, but it looks fine to me:
$ head Rattus_norvegicus.Rnor_6.0.95.gtf
#!genome-build Rnor_6.0
#!genome-version Rnor_6.0
#!genome-date 2014-07
#!genome-build-accession NCBI:GCA_000001895.4
#!genebuild-last-updated 2017-01
1 ensembl_havana gene 396700 409750 . + . gene_id "ENSRNOG00000046319"; gene_version "4"; gene_name "AABR07000046.1"; gene_source "ensembl_havana"; gene_biotype "processed_transcript";
1 ensembl transcript 396700 409676 . + . gene_id "ENSRNOG00000046319"; gene_version "4"; transcript_id "ENSRNOT00000044187"; transcript_version "4"; gene_name "AABR07000046.1"; gene_source "ensembl_havana"; gene_biotype "processed_transcript"; transcript_name "AABR07000046.1-202"; transcript_source "ensembl"; transcript_biotype "processed_transcript";
1 ensembl exon 396700 396905 . + . gene_id "ENSRNOG00000046319"; gene_version "4"; transcript_id "ENSRNOT00000044187"; transcript_version "4"; exon_number "1"; gene_name "AABR07000046.1"; gene_source "ensembl_havana"; gene_biotype "processed_transcript"; transcript_name "AABR07000046.1-202"; transcript_source "ensembl"; transcript_biotype "processed_transcript"; exon_id "ENSRNOE00000493937"; exon_version "1";
1 ensembl exon 397780 397788 . + . gene_id "ENSRNOG00000046319"; gene_version "4"; transcript_id "ENSRNOT00000044187"; transcript_version "4"; exon_number "2"; gene_name "AABR07000046.1"; gene_source "ensembl_havana"; gene_biotype "processed_transcript"; transcript_name "AABR07000046.1-202"; transcript_source "ensembl"; transcript_biotype "processed_transcript"; exon_id "ENSRNOE00000544646"; exon_version "1";
1 ensembl exon 399062 399070 . + . gene_id "ENSRNOG00000046319"; gene_version "4"; transcript_id "ENSRNOT00000044187"; transcript_version "4"; exon_number "3"; gene_name "AABR07000046.1"; gene_source "ensembl_havana"; gene_biotype "processed_transcript"; transcript_name "AABR07000046.1-202"; transcript_source "ensembl"; transcript_biotype "processed_transcript"; exon_id "ENSRNOE00000574883"; exon_version "1";
I have tried re-running prepare-reference, but to no avail. Where else might I look for the source of the problem?
PS Apologies for cross-posting with the RSEM Google group, but after I posted there I noticed that it has been months since anyone's question has been answered there and decided to break etiquette.
could you explain a bit more how you did it? instead of lets say --transcripts xx.fasta , you did what?