Question

Salmon warning detected suspicious pair

1

Entering edit mode

9.1 years ago

bharata1803 ▴ 560

I got this error while trying to do salmon from an aligned bam file. The bam file is an output of tophat from paired end fastq file.

WARNING: Detected suspicious pair ---
    The names are different:
    read1 : SRR2103637.12484815
    read2 : SRR2103637.12492049
    The proper-pair statuses are inconsistent:
read1 [SRR2103637.12484815] : proper-pair; mapped; matemapped
read2 : [SRR2103637.12484815] : no proper-pair; mapped; matemapped

My salmon command is:

salmon quant -t ../idx/Ch38.cdna.all.fa -l IU -a sorted.bam -g ../gene_map.txt -o salmon_out

I tried both unsorted bam and sorted bam and both resulted the same warning.

My questions are:

What can I do to fix this or I can just ignore it?
For libtype parameter in Salmon, how to choose the correct parameter? How can I check whether it is inward, backward, or matching and stranded of not stranded? Can I check from the fastq files or it has something to check from the experiment itself? I downloaded the data from NCBI GEO and it said it is from Illumina HiSeq2500100bp paired-end mode of the TruSeq Rapid PE Cluster kit and TruSeq Rapid SBS kit (Illumina)

Thank you for your answer and suggestion.

salmon RNA-Seq • 4.5k views

ADD COMMENT • link updated 3.7 years ago by Ram 44k • written 9.1 years ago by bharata1803 ▴ 560

0

Entering edit mode

Hi baharata,

One thing that seems strange to me in your description is this:

The bam file is an output of tophat from paired end fastq file

The fact that the alignments are the output of TopHat suggests that the alignment was against the genome. However, Salmon requires alignment against the transcriptome (where the aligner might be e.g. Bowtie2). It is, of course, possible that you aligned against the genome with TopHat and then converted the alignments into transcriptomic coordinates, but since this is uncommon and you didn't mention this, I assume this isn't the case. Can you clarify if these alignments are to the genome or to the transcriptome (as Salmon expects).

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by Rob 6.9k

0

Entering edit mode

I used tophat to align fastq files to transcriptome reference from Ensembl (human cDNA reference) so I think it is okay to process with Salmon directly after that.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 9.1 years ago by bharata1803 ▴ 560

0

Entering edit mode

That's interesting, as TopHat is generally a split-read aligner that is used to map RNA-seq reads to the genome. When aligning directly to the transcriptome, one wants to avoid split-read mappings. I might see how things differ on one of the samples if you map with Bowtie2 (or something comparable) instead of TopHat.

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by Rob 6.9k

0

Entering edit mode

I am having the same problem, did you manage to solve it. I have tried,

all combinations of library type
using fasta file of transcripts (.fa) as reference
indexed transcripts as reference

The major problem I am facing is, I just have BAM files aligned using HISAT. I do not have any source FASTQ files.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 8.4 years ago by EagleEye 7.6k

0

Entering edit mode

Hi EagleEye,

If you are using pre-aligned reads with salmon, the "reference" sequences that you pass to salmon should be the FASTA file containing your reference transcripts. However, if your reads have been aligned using HISAT, the bigger concern is that the alignment is likely done with respect to the genome rather than the transcriptome. A tool like sam-xlate (mentioned here should be able to covert from genomic to transcriptomic alignments). It is also important to note that Salmon (like RSEM) requires the alignment records for a given read, in case of multimapping, to (1) be consecutive in the input SAM/BAM file and (2) for the records for mates (i.e. left and right reads) to be consecutive. The most common alignment tools (e.g. Bowtie2, BWA, STAR) do this, and I believe HISAT does as well (since it uses much of Bowtie2's infrastructure for file parsing / writing), but that is worth verifying. Finally, in a worst-case scenario, you could always consider trying to recover some FASTQ file from the BAM (using e.g. bam2fastq) and then performing your quantification with that.

ADD REPLY • link 8.4 years ago by Rob 6.9k

0

Entering edit mode

Thanks a lot for your suggestions, I will get back once I try these solutions.

ADD REPLY • link 8.4 years ago by EagleEye 7.6k

Ram · Answer 1 · 2015-11-06

1

Entering edit mode

9.1 years ago

andrew.j.skelton73 6.6k

Shouldn't your Salmon index be a directory, not a fasta file?

ADD COMMENT • link updated 3.7 years ago by Ram 44k • written 9.1 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

From the documentation, I am trying to use the alignment based and this is the command:

./bin/salmon quant -t transcripts.fa -l <LIBTYPE> -a aln.bam -o salmon_quant

I think the fasta file is correct to be used according to this documentation.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 9.1 years ago by bharata1803 ▴ 560

0

Entering edit mode

What version of Salmon? Rob Patro might see this and comment. As a quick check to make sure it's not Salmon that's the issue, you can convert the bam into fastq and try it with reads instead of alignment

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.1 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

I just updated to the latest, 0.5.0 I think. I used salmon before for single end fastq so this is the first time I tried with paired end. I have the original fastq, so I will try reads method.

ADD REPLY • link updated 3.7 years ago by Ram 44k • written 9.1 years ago by bharata1803 ▴ 560