Hi,
I am facing an error with rsem-calculate-expression, while trying to process fastq files with STAR alignment option from RSEM. The alignment occurs perfectly, but when the rsem-parse-alignments command starts it throws an error that the SAM/BAM file declares more reference sequence than RSEM knows. Please find below the command and the output:
Input command
rsem-calculate-expression --star --star-path /home/sbhattach2/STAR-2.6.0a/bin/ \
--star-gzipped-read-file -p 8 --paired-end --strandedness reverse \
/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R1_001.fastq.gz /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R2_001.fastq.gz \
/data/Suro/Fasta/Rsem_Human_Ref1 \
/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood
Output
/home/sbhattach2/STAR-2.6.0a/bin//STAR --genomeDir /data/Suro/Fasta --outSAMunmapped Within --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outFilterMultimapNmax 20 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --runThreadN 8 --genomeLoad NoSharedMemory --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --outSAMheaderHD @hd VN:1.4 SO:unsorted --outFileNamePrefix /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood --readFilesCommand zcat --readFilesIn /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R1_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R1_001.fastq.gz /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S21_L002_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L006_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L007_R2_001.fastq.gz,/data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/fastq/UDN163672_UF_Blood_S45_L008_R2_001.fastq.gz
Nov 14 11:53:41 ..... started STAR run
Nov 14 11:53:41 ..... loading genome
Nov 14 11:55:02 ..... started mapping
Nov 14 12:10:05 ..... finished successfully
rsem-parse-alignments /data/Suro/Fasta/Rsem_Human_Ref1 /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.stat/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood.bam 3 -tag XM
The SAM/BAM file declares more reference sequences (203798) than RSEM knows (196483)!
"rsem-parse-alignments /data/Suro/Fasta/Rsem_Human_Ref1 /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.stat/UDN163672_UF_Blood /data/Suro/UDNRNASeq/F10_UDN236041_Thigpen.Billy/counts/UDN163672_UF_Blood.temp/UDN163672_UF_Blood.bam 3 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline!
I have earlier used STAR 2.5.3a for the alignment using the same command and I got the TPM counts perfectly. However, when I re-ran the process using the same scripts with STAR 2.5.3a and rsem 1.3.0, I faced the issue. Now even after reinstalling STAR and rsem and also creating the reference again I get the same issue. The fasta file used is Homo_sapiens.GRCh37.dna.primary_assembly.fa and gtf gencode.v19.annotation_mod.gtf.
Please let me know, if you need any other information.
Thanks again for all the help in advance.
Surajit
I meet the same question as you. Have you solved this? Thanks for you help in advance.
I'm going through the same problem. Did any of you solve it?