Running salmon in alignment mode?
0
0
Entering edit mode
2.5 years ago
pearl2070 ▴ 10

This is my first time working with RNA data. So far, I've run the dataset through trimmomatic, sortmeRNA, megahit and BWA. I'm trying to run Salmon using the SAM file output from BWA and the .fa resulting from megahit as the transcriptome.

I run this line:

./salmon-1.8.0_linux_x86_64/bin/salmon quant -p 12 -t Sample1_megahit.contigs.fa -l A -a Sample1_megahit.annotation_bwa.sam -o Sample1_salmon 

I get this error at the end of a stream of lines that all say a variation of "this transcript not found in reference" and I'm not sure what reference it's referring to: Please provide a reference FASTA file that includes all targets present in the BAM header.

Should I be passing the unassembled transcriptome from before megahit or something? The megahit file filtered to have only transcripts that had successful BWA alignments? I'm not sure how to do that. The data was originally paired end, if that is relevant.

Thanks in advance!

transcriptomics salmon metatranscriptomics rna rna-seq • 1.8k views
ADD COMMENT
1
Entering edit mode

Which fasta did you use for BWA alignment?

ADD REPLY
0
Entering edit mode

For BWA, I ran:
bwa index -a bwtsw microbial_all_cds.fasta

Followed by:
bwa mem -t 32 microbial_all_cds.fasta Sample1_megahit.contigs.fa > Sample1_megahit.annotation_bwa.sam

ADD REPLY
0
Entering edit mode

Salmon is telling you that the names of the contigs in Sample1_megahit.contigs.fa are not the same as the names of the contifg in Sample1_megahit.annotation_bwa.sam

ADD REPLY
0
Entering edit mode

What could be causing this? Could it be because there are many contigs in Sample1_megahit.contigs.fa that didn't get annotated, and therefore aren't present in Sample1_megahit.annotation_bwa.sam?

ADD REPLY
0
Entering edit mode

In your comment to @Shred you say that you aligned to microbial_all_cds.fasta. If that's the case, then you must pass microbial_all_cds.fasta to salmon.

ADD REPLY
0
Entering edit mode

Where? If I run

./salmon-1.8.0_linux_x86_64/bin/salmon quant -p 12 -t microbial_all_cds.fasta -l A -a Sample1_megahit.annotation_bwa.sam -o Sample1_salmon 

Then I encounter errors that read " Transcript appears twice in the transcript FASTA file" and "Transcript appears in the reference but did not appear in the BAM."

ADD REPLY
1
Entering edit mode

This means that you have multiple entires in your FASTA file that have the same name, which isn't allowed.

ADD REPLY
0
Entering edit mode

How do I resolve this? Is there some way to remove duplicates? I don't think I had a specific step to dereplicate sequences in my pipeline, actually. Could that have caused this problem?

ADD REPLY
0
Entering edit mode

Are you sure megahit is appropriate to use with RNAseq data? It appears to be a genome assembler.

ADD REPLY
0
Entering edit mode

I have seen it used in a few metatranscriptomics studies, but if it's likely to be the cause of this issue, I can try a different assembler.

ADD REPLY

Login before adding your answer.

Traffic: 1655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6