Question

Recommendations for performing Bulk RNA sequencing on non-model organism with poor genome annotation?

0

Entering edit mode

3 months ago

Mark ▴ 20

Our group wants to perform an RNA sequencing project on a non-model eukaryotic microalgae to study it's glucose metabolism in different environmental conditions. I have very limited RNA sequencing experience and I don't know what things I should look out for and be aware of before I start. I'm wondering if differential gene expression pipeline should look different based on working with an organism with a poorly annotated genome. I've been looking at using the nf-core RNAseq pipeline that pretty much automates getting transcript abundance in the cell but I'm wondering if it is the appropriate for the job. Also is there a good, modern gene prediction software I can use to compile all the RNA and DNA sequencing data for this organism and attempt to create a more robust predicted genome annotation than already exists?

RNA-seq • 489 views

ADD COMMENT • link updated 3 months ago by Trivas ★ 1.8k • written 3 months ago by Mark ▴ 20

0

Entering edit mode

Devil is always in the details and depending on how complex the genome of the organism is (size, number of chromosomes, repetitive nature) that you are working with your level of success will vary. At this point there (should/may) be a related genome in GenBank. If so you could have a starting point to work from. More sequencing you do (from different life stages of the organism, if possible) the better represented will your transcriptome be.

Creating a good (enough) transcriptome will itself be a significant undertaking, especially if you have little or no data available in public databases. You have already received good suggestions about what to do for that below. Do use a mix of short/long read data if you can.

Unless you do whole genome DNAseq you will not have any idea about gene structure. That would be a completely different kind of sequencing. You have not indicated if you are planning to do this.

ADD REPLY • link 3 months ago by GenoMax 148k

score 0 · Answer 1 · 2024-08-27

0

Entering edit mode

3 months ago

Trivas ★ 1.8k

You could use something like StringTie to generate a transcriptome reference then follow the standard nf-core RNAseq pipeline. The pipeline shouldn't look any different for a poorly annotated species (IMO) but you should definitely re-run the analysis as the reference files (genome and transcriptome assemblies) improve in accuracy.

ADD COMMENT • link 3 months ago by Trivas ★ 1.8k

1

Entering edit mode

If this is an option, you could do long-read sequencing with one aliquot of RNA for the assembly, and then short-read for the differential expression. Assembly is not my field, but from what I read and gather people prefer long reads for it. There are service providers such as Novogene where you simply send RNA and get back the data, be it short or long. I would not recommend doing library prep and sequencing yourself these days unless it's very custom requirements.

ADD REPLY • link 3 months ago by ATpoint 86k

0

Entering edit mode

Absolutely agree to outsource for one off experiments like this. IIRC Novogene has a pretty long turnaround time but that might not matter for OP.

ADD REPLY • link 3 months ago by Trivas ★ 1.8k