Hi,
I have mouse RNA-seq
data (single-end
stranded - reverse strand) which I STAR
mapped against mm10
with gencode.vM12.primary_assembly.annotation
GTF
, where I ran STAR
in a mode that also generates a bam
file of the reads mapping to the transcriptome
.
For my purpose I'd like to retain only reads that map to transcripts annotated as protein_coding
in the GTF
, which would be my total, meaning TPM
s will be calculated based on that slice of the pie rather than based on all reads.
What I did is samtools
sort
and index
the transcriptomic
bam
, and then subset that bam
with a bed
file which only includes the transcripts that are annotated as protein_coding
. This reduces the number of mapped reads from 11,653,865 to 3,483,962.
When I use Salmon
to quantify expression of that subsetted bam
, Salmon
crashes (so does MMSEQ
), but it doesn't if I give it the un-subsetted bam
.
Does anyone have any idea why it's crashing?
What are the error messages when Salmon and MMSEQ crash?
Think it was a bam header issue. Seems to work now.
It would be helpful for everybody if you describe how the problem arose and how you solved it.
I had an error in how I edited the bam's error which produced this problem so I don't think it's worth posting.