Empty genes.bam files in RSEM-STAR workflow
0
0
Entering edit mode
21 months ago
evocanres ▴ 30

This question concerns an RNAseq data aligment and transcript quantification step that generates empty bam files but still generates counts file. I would love to know 1. If the count data is reliable if bam files are empty; and 2. How I can fix the issue. I am sharing my workflow, results and my search for solutions so far. Any suggestions would be appreciated.

I am working on a RNAseq dataset from JGI genome portal These are paired-end sequenced data, as noted from the description, but has 1 fastq.gz files per sample. So I followed the following steps

  1. Download data using Globus to hpc.
  2. Fastqc
  3. Deinterleave using https://telatin.github.io/seqfu2/tools/deinterleave.html
  4. run rsem-calculate-expression with the following script in a slurm job array.
rsem-calculate-expression --star \
                        --star-path /modules/apps/star/2.7.9a/bin/Linux_x86_64 \
                        -p 60 \
                        --paired-end \
                        --estimate-rspd \
                        --append-names \
                        --output-genome-bam \
                        "${prefix}_R1.fq" "${prefix}_R2.fq" \
                        $genomDIR/GRcm39 \
                        $rsemDIR/"${prefix}"

Results

  1. This is the job.err message
Loading star version 2.7.9a
Loading rsem version 1.3.3
rsem-tbam2gbam: BamConverter.h:131: void BamConverter::process(): Assertion `cqname != qname' failed.
slurm-calex_mm10_5589353.err (END)
  1. Checking the bam files and the genome.bam file is empty
10.genome.bam 0
11.genome.bam 0
12.genome.bam 1.0M
13.genome.bam 0
14.genome.bam 1.0M
15.genome.bam 0
8.genome.bam 0
9.genome.bam 1.0M
91.genome.bam 0
92.genome.bam 1.0M
93.genome.bam 1.0M
94.genome.bam 0
17.genome.bam 2.0M
19.genome.bam 0
20.genome.bam 0
21.genome.bam 1.0M
22.genome.bam 0
23.genome.bam 0
24.genome.bam 0
25.genome.bam 5.0M
26.genome.bam 0
27.genome.bam 0
28.genome.bam 1.0M
  1. The transcript bams are okay.
10.transcript.bam 13G
11.transcript.bam 5.7G
12.transcript.bam 11G
13.transcript.bam 7.4G
14.transcript.bam 5.4G
15.transcript.bam 12G
8.transcript.bam 8.2G
9.transcript.bam 9.9G
91.transcript.bam 12G
92.transcript.bam 7.1G
93.transcript.bam 6.6G
94.transcript.bam 11G
17.transcript.bam 6.5G
19.transcript.bam 11G
20.transcript.bam 9.0G
21.transcript.bam 9.4G
22.transcript.bam 9.8G
23.transcript.bam 7.1G
24.transcript.bam 11G
25.transcript.bam 11G
26.transcript.bam 12G
27.transcript.bam 11G
28.transcript.bam 7.9G

Attempts for solutions:

  1. One user in this thread mentions "I previously used the --outSAMunmapped Within option. By removing this option, I was able to get rsem-tbam2gbam to work. " This discussion on --outSAMunmapped Within should not affect current analysis but reanalysis if the bam files are used later.
  2. I tried looking at the BamConverter.h code but can not figure out for myself what the error message meant.
star rsem • 1.1k views
ADD COMMENT
0
Entering edit mode

If you don't need the genome bam file, try skipping --output-genome-bam

This option generate a BAM file, with alignments mapped to genomic coordinates and annotated with their posterior probabilities.

ADD REPLY
0
Entering edit mode

I would like to have the BAM files.

ADD REPLY
0
Entering edit mode

I am just curious, in the 9 months since you posted there was no solution?

The error comes from this line of the source:

https://github.com/deweylab/RSEM/blob/8bc1e2115493c0cdf3c6bee80ef7a21a91b2acce/BamConverter.h#L131

I would first check that the input FASTQ files are properly paired.

If your input data is valid I would say that is probably best reported as a bug as

it seems that even the software authors added the assertion there as a sanity check, checking for an event that should never happen.

ADD REPLY

Login before adding your answer.

Traffic: 1657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6