Hi all,
I would like to examine the complete transcripts sequence of specific genes on the RNA-seq. As input, I can use the RNA-seq FASTQ file. for example: Looking at all BRAF transcripts full transcript (all cDNA combinations) on the FASTQ file (could be pair end or not)
What will be the recommended way to do so? It will be great if I could avoid processing the whole sequence at some point of the processing pipeline and focus on the requested gene.
Many thanks, Eila
Hi Carlo, thank you for the quick answer. As far as I understand pseudo alignment is being used for genes / transcript expression levels calculation and does not “need” to read the full transcripts matching to do so. My experience with this kind of tools, such as Kallisto, and it does not report the full mRNA transcript sequence.
Are you familiar with and alignment tool that reports the full mRNA / cDNA transcript for the aligned transcript? I have added an image for what I am looking for.
Thanks, Eila
You could try to align your reads to the genome (using hisat2 or start) and then use a genome-guided assembler to retrieve the mRNA sequences (for instance stringtie2). I would suggest aligning against the complete genome in the first step. You could try and filter the reads by position after that before the assembly, using samtools, to improve speed (and also only keep lightweight bam files for future use). If there is no genome avaiable I recomend assembling the whole transcriptome using trinity as refered above.