Question

RNA seq for non-model species

0

Entering edit mode

21 months ago

weather • 0

Hi,

I currently have three non-model fish species that need to run RNA-seq analysis, which means I can't find the reference genome for these species. I only have experience running RNA-seq for species that have a reference genome, and my workflow is fastqc-trimmomatic-hisat2-featurecounts-edgeR/Deseq2-DAVID functional annotation tools.

I would also like to run differential gene expression and pathway analysis for these non-model species. I'm wondering if there are any resources I can take a look at or recommended workflow for running analysis for non-model species.

Thanks.

non-model-species RNA-seq • 1.9k views

ADD COMMENT • link updated 21 months ago by cfos4698 ★ 1.2k • written 21 months ago by weather • 0

0

Entering edit mode

21 months ago

dthorbur ★ 3.1k

There are a couple of options I've used in similar situations.

Use an existing transcriptome of a well annotated species. Zebrafish, for example, has a lot of resources available. However, the larger the evolutionary distance, the more spurious your results can become.
If you have good enough RNAseq data, you can try using stringtie in de novo mode (some docs here), which you can then use as input for transcriptome. However, there will likely need to be some strict quality thresholds for the annotations. I tend to find de novo annotation pipelines are quite noisy.

Otherwise, the analysis would be the same as with other species. Downstream analyses like GO and KEGG are a little harder, but if you have the time you can annotate the transcriptome yourself. I've also seen the use of orthologs of a better annotated species for these kinds of downstream analyses in the literature.

ADD COMMENT • link 21 months ago by dthorbur ★ 3.1k

0

Entering edit mode

Thanks for answering. I will probably try de novo annotation.

ADD REPLY • link 21 months ago by weather • 0

score 2 · Accepted Answer · 2024-01-31

2

Entering edit mode

21 months ago

cfos4698 ★ 1.2k

In this situation I've used a workflow like the following:

De novo assembly with multiple assemblers and kmer sizes: e.g. Trinity, Trans-ABySS, rnaSPAdes, SPAdes single-cell
Reduce the redundancy of the assemblies to retain the best transcripts from each assembly: EvidentialGene pipeline
Annotate the transcriptome using the Trinotate pipeline, with any modifications you see fit
Abundance estimation using salmon (can also use other methods of course)
Differential expression using DESeq2
Gene Ontology and KEGG enrichment leveraging the annotations from step 3

ADD COMMENT • link 21 months ago by cfos4698 ★ 1.2k

0

Entering edit mode

Thanks for answering! It is a really clear workflow and I will try to follow it.

ADD REPLY • link 21 months ago by weather • 0

0

Entering edit mode

One quick question. Since I have multiple treatment groups and replicates for the same non-model species. Do I need to run Trinity for each of them or there is a way to pool them together since they come from the same species?

ADD REPLY • link 21 months ago by weather • 0

0

Entering edit mode

Have a look at the --samples_file option for Trinity: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Running-Trinity#typical-trinity-command-line

For the other assemblers (if you choose to use them) you can typically just specify all reads for all conditions/replicates for the same species on the command line, e.g.:

rnaspades.py \
--pe1-1 "species1_condition1_sample1_R1.fq.gz" \
--pe1-1 "species1_condition1_sample2_R1.fq.gz" \
--pe1-1 "species1_condition2_sample1_R1.fq.gz" \
--pe1-1 "species1_condition2_sample2_R1.fq.gz" \
--pe1-2 "species1_condition1_sample1_R2.fq.gz" \
--pe1-2 "species1_condition1_sample2_R2.fq.gz" \
--pe1-2 "species1_condition2_sample1_R2.fq.gz" \
--pe1-2 "species1_condition2_sample2_R2.fq.gz" \
 -t $THREADS -m $MEMORY -o $OUTDIR

ADD REPLY • link 21 months ago by cfos4698 ★ 1.2k

0

Entering edit mode

Thanks you for the information!

ADD REPLY • link 21 months ago by weather • 0

0

Entering edit mode

If I or anyone else have given you helpful advice, please upvote. If we've answered your question, please mark as accepted.

ADD REPLY • link 21 months ago by cfos4698 ★ 1.2k