RSEM: transcripts.fa.grp not found (should it have been generated?)
1
0
Entering edit mode
5.8 years ago
jhanks1981 ▴ 10

When I try to run RSEM calculate-expression, it fails with the following error:

Cannot open /sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat.transcripts.fa.grp! It may not exist.

I checked in the rsem reference folder and sure enough there is no rat.transcripts.fa.grp. There is a rat.grp however. The complete list of the contents of the reference folder is:

Genome         chrName.txt        exonInfo.tab          rat.grp         rat.ti              sjdbList.fromGTF.out.tab
SA             chrNameLength.txt  geneInfo.tab          rat.idx.fa      rat.transcripts.fa  sjdbList.out.tab
SAindex        chrStart.txt       genomeParameters.txt  rat.n2g.idx.fa  ratLog.out          transcriptInfo.tab
chrLength.txt  exonGeTrInfo.tab   rat.chrlist           rat.seq         sjdbInfo.txt

I double checked the .out file from when I created the genome, and saw no sign of anything going wrong:

rsem-extract-reference-transcripts /sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat 0 /sc/orga/projects/MetaDope/genomes/rat/download/Rattus_norvegicus.Rnor_6.0.95.gtf None 0 /sc/orga/projects/MetaDope/genomes/rat/download/Rattus_norvegicus.Rnor_6.0.dna_sm.toplevel.fa
Parsed 200000 lines
Parsed 400000 lines
Parsed 600000 lines
Parsing gtf File is done!
/sc/orga/projects/MetaDope/genomes/rat/download/Rattus_norvegicus.Rnor_6.0.dna_sm.toplevel.fa is processed!
41078 transcripts are extracted.
Extracting sequences is done!
Group File is generated!
Transcript Information File is generated!
Chromosome List File is generated!
Extracted Sequences File is generated!


rsem-preref /sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat.transcripts.fa 1 /sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat
Refs.makeRefs finished!
Refs.saveRefs finished!
/sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat.idx.fa is generated!
/sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat.n2g.idx.fa is generated!


/hpc/packages/minerva-common/star/2.7.0a/src/STAR-2.7.0a/bin/STAR  --runThreadN 8  --runMode genomeGenerate  --genomeDir /sc/orga/projects/MetaDope/genomes/rat/rsemRef  --genomeFastaFiles /sc/orga/projects/MetaDope/genomes/rat/download/Rattus_norvegicus.Rnor_6.0.dna_sm.toplevel.fa  --sjdbGTFfile /sc/orga/projects/MetaDope/genomes/rat/download/Rattus_norvegicus.Rnor_6.0.95.gtf  --sjdbOverhang 100  --outFileNamePrefix /sc/orga/projects/MetaDope/genomes/rat/rsemRef/rat
Feb 13 05:34:28 ..... started STAR run
Feb 13 05:34:28 ... starting to generate Genome files
Feb 13 05:36:23 ... starting to sort Suffix Array. This may take a long time...
Feb 13 05:37:15 ... sorting Suffix Array chunks and saving them to disk...
Feb 13 06:08:42 ... loading chunks from disk, packing SA...
Feb 13 06:10:11 ... finished generating suffix array
Feb 13 06:10:11 ... generating Suffix Array index
Feb 13 06:14:51 ... completed Suffix Array index
Feb 13 06:14:51 ..... processing annotations GTF
Feb 13 06:15:00 ..... inserting junctions into the genome indices
Feb 13 06:19:12 ... writing Genome to disk ...
Feb 13 06:19:15 ... writing Suffix Array to disk ...
Feb 13 06:19:37 ... writing SAindex to disk
Feb 13 06:19:39 ..... finished successfully

In another thread, I saw that lack of transcript annotations in the GTF file can cause problems, but it looks fine to me:

$ head Rattus_norvegicus.Rnor_6.0.95.gtf 
#!genome-build Rnor_6.0
#!genome-version Rnor_6.0
#!genome-date 2014-07
#!genome-build-accession NCBI:GCA_000001895.4
#!genebuild-last-updated 2017-01
1 ensembl_havana gene 396700 409750 . + . gene_id "ENSRNOG00000046319"; gene_version "4"; gene_name "AABR07000046.1"; gene_source "ensembl_havana"; gene_biotype "processed_transcript";
1 ensembl transcript 396700 409676 . + . gene_id "ENSRNOG00000046319"; gene_version "4"; transcript_id "ENSRNOT00000044187"; transcript_version "4"; gene_name "AABR07000046.1"; gene_source "ensembl_havana"; gene_biotype "processed_transcript"; transcript_name "AABR07000046.1-202"; transcript_source "ensembl"; transcript_biotype "processed_transcript";
1 ensembl exon 396700 396905 . + . gene_id "ENSRNOG00000046319"; gene_version "4"; transcript_id "ENSRNOT00000044187"; transcript_version "4"; exon_number "1"; gene_name "AABR07000046.1"; gene_source "ensembl_havana"; gene_biotype "processed_transcript"; transcript_name "AABR07000046.1-202"; transcript_source "ensembl"; transcript_biotype "processed_transcript"; exon_id "ENSRNOE00000493937"; exon_version "1";
1 ensembl exon 397780 397788 . + . gene_id "ENSRNOG00000046319"; gene_version "4"; transcript_id "ENSRNOT00000044187"; transcript_version "4"; exon_number "2"; gene_name "AABR07000046.1"; gene_source "ensembl_havana"; gene_biotype "processed_transcript"; transcript_name "AABR07000046.1-202"; transcript_source "ensembl"; transcript_biotype "processed_transcript"; exon_id "ENSRNOE00000544646"; exon_version "1";
1 ensembl exon 399062 399070 . + . gene_id "ENSRNOG00000046319"; gene_version "4"; transcript_id "ENSRNOT00000044187"; transcript_version "4"; exon_number "3"; gene_name "AABR07000046.1"; gene_source "ensembl_havana"; gene_biotype "processed_transcript"; transcript_name "AABR07000046.1-202"; transcript_source "ensembl"; transcript_biotype "processed_transcript"; exon_id "ENSRNOE00000574883"; exon_version "1";

I have tried re-running prepare-reference, but to no avail. Where else might I look for the source of the problem?

PS Apologies for cross-posting with the RSEM Google group, but after I posted there I noticed that it has been months since anyone's question has been answered there and decided to break etiquette.

RNA-Seq rsem • 3.2k views
ADD COMMENT
1
Entering edit mode
5.8 years ago
jhanks1981 ▴ 10

I figured it out. I entered a path to the .fa file where I should have entered the path + name prefix. Seems to work now.

ADD COMMENT
1
Entering edit mode

could you explain a bit more how you did it? instead of lets say --transcripts xx.fasta , you did what?

ADD REPLY

Login before adding your answer.

Traffic: 2521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6