Question

Alignng RNAseq data for gene prediction

0

Entering edit mode

4.2 years ago

karthic ▴ 130

Hi,

I have rna-seq data for five tissue samples and I am planning to use these for gene prediction. Should I align them all together in one step to the genome or each tissue separately and later merge the bams??

Thanks, Karthic

gene prediction RNA-Seq • 1.3k views

ADD COMMENT • link updated 4.2 years ago by liorglic ★ 1.5k • written 4.2 years ago by karthic ▴ 130

0

Entering edit mode

Alternative plan:

make a genome-guided transcriptome assembly (e.g. with Trinity)
use the generated transcripts as evidence for gene prediction

ADD REPLY • link 4.2 years ago by Michael 55k

0

Entering edit mode

I have always known Trinity as de-novo transcriptome assembler.

ADD REPLY • link 4.2 years ago by karthic ▴ 130

0

Entering edit mode

Trinotate, part of the broader Trinity workflow, performs functional annotation: https://github.com/griffithlab/rnaseq_tutorial/wiki/Trinotate-Functional-Annotation

ADD REPLY • link 4.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks. I will go through that.

ADD REPLY • link 4.2 years ago by karthic ▴ 130

0

Entering edit mode

Should I align them all together in one step

If you have a reference available then you are likely not doing gene prediction. And if you don't then you should be assemblig the data as suggested by @Michael.

ADD REPLY • link 4.2 years ago by GenoMax 147k

0

Entering edit mode

We have assembled the genome and there is no other annotation available for this species. We have rnaseq and isoseq for some tissues. Currenlty figuring out the way I should prepare the files.

ADD REPLY • link 4.2 years ago by karthic ▴ 130

0

Entering edit mode

Was the genome assembly done independent of the RNAseq data? What do you mean by "prepare the files"?

ADD REPLY • link 4.2 years ago by GenoMax 147k

0

Entering edit mode

Yes, independent of the RNAseq data. I mean gathering the evidence for the gene prediction by utilizing the RNAseq and isoseq data.

The RNAseq data should be assembled with trinity and transcripts to be given as input to tools like augustus, genemark etc or they should be mapped to genome with tools like hisat2/tophat and generate models with stringtie/cufflinks and later given as input to augustus, genemark etc.

ADD REPLY • link 4.2 years ago by karthic ▴ 130

score 0 · Answer 1 · 2020-09-28

0

Entering edit mode

4.2 years ago

liorglic ★ 1.5k

I think you can map everything at once. If you are worried that you might loose isoform information - you can also do it separately for each tissue and merge only at the end. I had pretty good experience with GAWN for that type of analysis - check it out.