Solution for pop-up problem with rsem-prepare-reference (RSEM can not recognize reference sequence name ATM...)
0
0
Entering edit mode
16 months ago
tsogurlu • 0

Hello everyone,

I was trying to use rsem-calculate-expression with the Aligned.toTranscriptome.out.bam file created by STAR. However, when I tried to run the code, I faced the error saying "RSEM can not recognize reference sequence name ATMG01375.1". I have searched for the solutions on the internet, and there was no explanation (At least, I couldn't). Then I decided to check the documents created by rsem-prepare-reference command manually. Then I have seen additional descriptions for this transcript in the all documents ("reference_name"). Here is the explanation:

>ATMG01330.1
GGGAGAGTGGTCAAAAGCGGCAGACTGTAAATCTGTTGAAGTTTTTCTACGTAGGTTCGAATCCTGCCTCTCCCA
>ATMG01340.1
ACCTACTTGACTCAGCGGTTAGAGTATCGCTTTCATACGGCGAGAGTCATTGGTTCAAATCCAATAGTAGGTA
>ATMG01375.1;Name=ATMG01375.1;id2=exon-id-trnH(GTG)-1;parent2=id-trnH(GTG)
GCGGATGTAGCCAAGTGGATCAAGGCAGTGGATTGTGAATCCACCATGCGCGGGTTCAATTCCCGTCGTTCGCC
>ATMG01380.1
AAACCGGGCACTACGGTGAGACGTGAAAACACCCGATCCCATTCCGACCTCGATATGTGGAATCGTCTTGCGCCATATGTACTGAGATTGTTCGGGAGACATGGTCCAAGCCCGGTGA

When it comes to how to solve the problem, I manually deleted these explanations. I do not know why it happened. I wanted to share this experience with the ones who faced this problem.

rsem • 1.2k views
ADD COMMENT
0
Entering edit mode

Is this in the transcripts.fa file? Can you show me the output to grep ATMG01375.1 gtf_file.gtf where gtf_file.gtf is the GTF file you used to prepare the reference?

ADD REPLY
0
Entering edit mode

All of them (.transcript.fa, .idx.fa, .n2g.idx.fa, .seq,) included these explanations just for this transcript. Yes, it stems from the gtf file. Here it is:

ChrM    Araport11       tRNA    124603  124676  .       -       .       transcript_id "ATMG01375.1"; gene_id "ATMG01375";

ChrM    Araport11       exon    124603  124676  .       -       .       transcript_id "ATMG01375.1;Name=ATMG01375.1;id2=exon-id-trnH(GTG)-1;parent2=id-trnH(GTG)"; gene_id "ATMG01375:exon:1";

This gtf file is created this month (Araport11_GTF_genes_transposons.Jul2023.gtf)

EDIT: It seems the 2021 gtf file (Araport11_GFF3_genes_transposons.Mar92021_gffread.gtf) has also the same description: https://github.com/gpertea/gffread/issues/74 When I used Araport11_GTF_genes_transposons.Mar172021.gtf.gz, there was no problem.

ADD REPLY
0
Entering edit mode

This seems improperly formatted. Are there other entries with this sort of weird IDs?

ADD REPLY
0
Entering edit mode

No, just this one: ATMG01375.1. At least, when I searched the word "Name", there was no other result.

ADD REPLY
0
Entering edit mode

OK then just edit that line and recreate the RSEM index.

ADD REPLY

Login before adding your answer.

Traffic: 1683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6