Hi, I am trying to run Salmon on bam files generated by STAR. I am new to using Salmon. I know that I cannot run Salmon on bam files obtained by aligning to the genome. But to my knowledge aligning to the genome is somewhat better than aligning to the transcriptome. Therefore, I generated bam files by aligning to the genome and outputting the alignments in transcriptome coordinates by setting --quantMode TranscriptomeSAM in STAR. I assume that before running Salmon on the bam files I have to randomize them, which I tried to do using samtools collate. I have used collate in the past without any problems on bam files generated by alignment to the genome. But now, when I try to run collate on the bam files generated with --quantMode TranscriptomeSAM, then I get error messages that look like this:
[E::bam_read1] CIGAR and query sequence lengths differ for A00257:310:HYL2GDSXX:1:1312:3115:21449
Error reading input file
So first, can I run Salmon on bam files generated by aligning to the genome and setting --quantMode TranscriptomeSAM in STAR? Or do I have to generate the bam files by aligning to the transcriptome? If the former is true, then any idea about the issue with collate? Thanks, Ina
You can run Salmon directly on the output files from
STAR --quantMode TranscriptomeSAM
, they are not sorted by position.