Hello everyone, I've been wracking my brain on this issue and I could really use some help. This is my first time working with RSEM to quantify transcript expression.
I have a bunch of FASTQ's that I've aligned with STAR and their resulting BAM's. For both RSEM and STAR, I used the rsem-prepare-reference method to make an index in the hope this would avoid compatibility issues.
rsem-prepare-reference --gtf ${GTF} --star --p 32 ${FASTA}
${GENOME_DIR}/human_ensembl
I'm using Homo_sapiens.GRCh38.105.gtf as my GTF and Homo_sapiens.GRCh38.dna.primary_assembly.fa as my genome FASTA.
Now that I've got the BAMs, I want to quantify expression.
rsem-calculate-expression --alignments --paired-end --no-bam-output
--append-names -p 2 "${STAR_OUT}"/"${line}"/"${line}"_Aligned.sortedByCoord.out.bam
"${GENOME_DIR}"/human_ensembl "${line}"
I'm getting this output for each iteration of my loop
rsem-parse-alignments ...(files) 3 -tag XM
"rsem-parse-alignments...(files) 3 -tag XM" failed! Please check if
you provide correct parameters/options for the pipeline!
Warning: The SAM/BAM file declares less reference sequences (194) than
RSEM knows (244825)! Please make sure that you aligned your reads
against transcript sequences instead of genome. RSEM can not recognize
reference sequence name 1!
I'm really confused by this. Since I used RSEM to generate the reference for STAR, there shouldn't be any conflicts right? I'd really appreciate any help.
edit: Also, my command for star is pointing to --genomedir ${GENOME_DIR} for the index, and the index generated by rsem, human_ensembl, is the only index in that directory.
I think this was it. I'm used to HTSeq and didn't realize RSEM needed that, but it makes perfect sense. I've got my RSEM jobs running now. Thank you so much Shiyang!