Dear Team,
I have two groups of samples which have went through the following pipeline
RAW FASTQ > Trimmed Fastq >STAR (Genome) >Htseq-Count & Cufflinks
I was going through the Vignettes from IsoformSwitchAnalyzeR and I am unable to follow how to move forward
Under Isoform/transcript quantification Option A:
It's says to use quantification from Salmon/kalisto. In STAR aligner I use the genome sequence to align against but according to salmon and Kalisto it requires transcriptome fasta(cDNA) as reference. So how move forward? Is it as discussed in the salmon where we redo the alignment to cDNA?
Under Isoform/transcript quantification Option B:
I felt this is the most suitable method from the pipeline similar to the existing method I used. The confusion for me is on the step 4 . After running the Cuffmerge for creating the merged gtf for including novel transcript should I need to run Cuffdiff ? or Is it that we use the gtf file from cuffmerge and use salmon downstream? If so the above issue of different reference exist here also (genome fasta and cDNA fasta) secondly for using the salmon with the new gtf file which fasta I need to use
@Rob Thank you for the reply. I have both Transcriptome bam and genome bam. But as you see the salmon documentation salmon needs bam/sam aligned to the cDNA reference not the Chromosome(DNA) reference. My concern is with this
The transcriptome bam file contains the genomic alignments projected to the transcriptome annotation. You should be able to see this if you look at the header of the
AlignedToTranscriptome.out.bam
file. So long as those records (sequence names and lengths) match the transcriptome file you pass tosalmon
, everything should be in working order. In fact, this very pipelineSTAR
-- projected to transcriptome -->salmon
is the default quantification pipeline innf-core/rnaseq
.in the nfcore the salmon has index with both genome and transcriptome fasta before running salmon quant where as in the same pipeline in star only uses genome.fasta. My issue is that I don't have the transcript_fasta for which I have the used the genome fasta
But thank you I will follow as what you have suggested in the nfcore. Still I am Concerned [:-)]
That is the indexing rule, but if you look at the quantification rule, you'll see that the way
salmon
is invoked is different depending upon if the user is using "alignment mode" or if the user is providing the rawFASTQ
files directly to salmon for it's builtin selective-alignment. For example, see here. Not to attempt to make an argument from authority here ;P, but I am an author and maintainer ofsalmon
, and we use it ourselves frequently with theAlignedToTranscriptome.out.bam
fromSTAR
. It works well (you can read more in this paper we published a few years ago "Alignment and mapping methodology influence transcript abundance estimation".Perhaps your concern is that you don't have the transcriptome
FASTA
file itself to pass tosalmon
? That should be fairly easy to obtain from the genome fasta and the GTF file you provided to STAR, and can be done using a tool likegffread
or thersem-prepare-reference
tool that you can see used innfcore
here.Why the double back-slashes?