Question

Transcript assemblers from RNA-seq reads - a request for tool recommendation

0

Entering edit mode

3.7 years ago

eilalandkof • 0

Hi all,

I would like to examine the complete transcripts sequence of specific genes on the RNA-seq. As input, I can use the RNA-seq FASTQ file. for example: Looking at all BRAF transcripts full transcript (all cDNA combinations) on the FASTQ file (could be pair end or not)

What will be the recommended way to do so? It will be great if I could avoid processing the whole sequence at some point of the processing pipeline and focus on the requested gene.

Many thanks, Eila

RNA-seq reads assembler • 1.5k views

ADD COMMENT • link 3.7 years ago by eilalandkof • 0

score 0 · Answer 1 · 2021-10-29

0

Entering edit mode

3.7 years ago

Carlo Yague 9.0k

As far as I know, the only way to know which reads come from the gene BRAF is to align/pseudo-align your sequences. Of course you can do it on only the BRAF genome sequence, but I would advise against it. Why ? Because if a read comes from a transcript with local homology with BRAF, it might get aligned to BRAF if it is the only hit possible. In contrast, if you align your reads on the full genome/transcriptome, it will be aligned to BRAF only if this provides the best alignment compared to other possible genes. This kind of healthy competition between alignment is important to avoid spurious results.

ADD COMMENT • link 3.7 years ago by Carlo Yague 9.0k

0

Entering edit mode

Hi Carlo, thank you for the quick answer. As far as I understand pseudo alignment is being used for genes / transcript expression levels calculation and does not “need” to read the full transcripts matching to do so. My experience with this kind of tools, such as Kallisto, and it does not report the full mRNA transcript sequence.

Are you familiar with and alignment tool that reports the full mRNA / cDNA transcript for the aligned transcript? I have added an image for what I am looking for.

Thanks, Eila enter image description here

ADD REPLY • link 3.7 years ago by eilalandkof • 0

0

Entering edit mode

You could try to align your reads to the genome (using hisat2 or start) and then use a genome-guided assembler to retrieve the mRNA sequences (for instance stringtie2). I would suggest aligning against the complete genome in the first step. You could try and filter the reads by position after that before the assembly, using samtools, to improve speed (and also only keep lightweight bam files for future use). If there is no genome avaiable I recomend assembling the whole transcriptome using trinity as refered above.

ADD REPLY • link 3.7 years ago by Rogerio Ribeiro ▴ 110

score 0 · Answer 2 · 2021-10-29

0

Entering edit mode

3.7 years ago

colindaven 7.7k

Along with the important caveats from Carlo Yague, you can check for BRAF transcripts de novo by assembling the whole transcriptome with Trinity https://github.com/trinityrnaseq/trinityrnaseq. I think this is available on Galaxy as well if you don't have local compute with big RAM (>64-128 GB likely necessary depending on your data).

ADD COMMENT • link 3.7 years ago by colindaven 7.7k

0

Entering edit mode

Thanks for the quick responses. I will try these recommendations. I have cloud computing available and using the human hg38 reference genome. So, should be fine to go with the non de-novo method

Eila

ADD REPLY • link 3.7 years ago by eilalandkof • 0