Hello everyone,
I was trying to use rsem-calculate-expression with the Aligned.toTranscriptome.out.bam file created by STAR. However, when I tried to run the code, I faced the error saying "RSEM can not recognize reference sequence name ATMG01375.1". I have searched for the solutions on the internet, and there was no explanation (At least, I couldn't). Then I decided to check the documents created by rsem-prepare-reference command manually. Then I have seen additional descriptions for this transcript in the all documents ("reference_name"). Here is the explanation:
>ATMG01330.1
GGGAGAGTGGTCAAAAGCGGCAGACTGTAAATCTGTTGAAGTTTTTCTACGTAGGTTCGAATCCTGCCTCTCCCA
>ATMG01340.1
ACCTACTTGACTCAGCGGTTAGAGTATCGCTTTCATACGGCGAGAGTCATTGGTTCAAATCCAATAGTAGGTA
>ATMG01375.1;Name=ATMG01375.1;id2=exon-id-trnH(GTG)-1;parent2=id-trnH(GTG)
GCGGATGTAGCCAAGTGGATCAAGGCAGTGGATTGTGAATCCACCATGCGCGGGTTCAATTCCCGTCGTTCGCC
>ATMG01380.1
AAACCGGGCACTACGGTGAGACGTGAAAACACCCGATCCCATTCCGACCTCGATATGTGGAATCGTCTTGCGCCATATGTACTGAGATTGTTCGGGAGACATGGTCCAAGCCCGGTGA
When it comes to how to solve the problem, I manually deleted these explanations. I do not know why it happened. I wanted to share this experience with the ones who faced this problem.
Is this in the
transcripts.fa
file? Can you show me the output togrep ATMG01375.1 gtf_file.gtf
wheregtf_file.gtf
is the GTF file you used to prepare the reference?All of them (.transcript.fa, .idx.fa, .n2g.idx.fa, .seq,) included these explanations just for this transcript. Yes, it stems from the gtf file. Here it is:
This gtf file is created this month (Araport11_GTF_genes_transposons.Jul2023.gtf)
EDIT: It seems the 2021 gtf file (Araport11_GFF3_genes_transposons.Mar92021_gffread.gtf) has also the same description: https://github.com/gpertea/gffread/issues/74 When I used Araport11_GTF_genes_transposons.Mar172021.gtf.gz, there was no problem.
This seems improperly formatted. Are there other entries with this sort of weird IDs?
No, just this one: ATMG01375.1. At least, when I searched the word "Name", there was no other result.
OK then just edit that line and recreate the RSEM index.