RNA seq for non-model species
2
0
Entering edit mode
9 months ago
weather • 0

Hi,

I currently have three non-model fish species that need to run RNA-seq analysis, which means I can't find the reference genome for these species. I only have experience running RNA-seq for species that have a reference genome, and my workflow is fastqc-trimmomatic-hisat2-featurecounts-edgeR/Deseq2-DAVID functional annotation tools.

I would also like to run differential gene expression and pathway analysis for these non-model species. I'm wondering if there are any resources I can take a look at or recommended workflow for running analysis for non-model species.

Thanks.

non-model-species RNA-seq • 902 views
ADD COMMENT
2
Entering edit mode
9 months ago
cfos4698 ★ 1.1k

In this situation I've used a workflow like the following:

  1. De novo assembly with multiple assemblers and kmer sizes: e.g. Trinity, Trans-ABySS, rnaSPAdes, SPAdes single-cell
  2. Reduce the redundancy of the assemblies to retain the best transcripts from each assembly: EvidentialGene pipeline
  3. Annotate the transcriptome using the Trinotate pipeline, with any modifications you see fit
  4. Abundance estimation using salmon (can also use other methods of course)
  5. Differential expression using DESeq2
  6. Gene Ontology and KEGG enrichment leveraging the annotations from step 3
ADD COMMENT
0
Entering edit mode

Thanks for answering! It is a really clear workflow and I will try to follow it.

ADD REPLY
0
Entering edit mode

One quick question. Since I have multiple treatment groups and replicates for the same non-model species. Do I need to run Trinity for each of them or there is a way to pool them together since they come from the same species?

ADD REPLY
0
Entering edit mode

Have a look at the --samples_file option for Trinity: https://github.com/trinityrnaseq/trinityrnaseq/wiki/Running-Trinity#typical-trinity-command-line

For the other assemblers (if you choose to use them) you can typically just specify all reads for all conditions/replicates for the same species on the command line, e.g.:

rnaspades.py \
--pe1-1 "species1_condition1_sample1_R1.fq.gz" \
--pe1-1 "species1_condition1_sample2_R1.fq.gz" \
--pe1-1 "species1_condition2_sample1_R1.fq.gz" \
--pe1-1 "species1_condition2_sample2_R1.fq.gz" \
--pe1-2 "species1_condition1_sample1_R2.fq.gz" \
--pe1-2 "species1_condition1_sample2_R2.fq.gz" \
--pe1-2 "species1_condition2_sample1_R2.fq.gz" \
--pe1-2 "species1_condition2_sample2_R2.fq.gz" \
 -t $THREADS -m $MEMORY -o $OUTDIR
ADD REPLY
0
Entering edit mode

Thanks you for the information!

ADD REPLY
0
Entering edit mode

If I or anyone else have given you helpful advice, please upvote. If we've answered your question, please mark as accepted.

ADD REPLY
0
Entering edit mode
9 months ago
dthorbur ★ 2.5k

There are a couple of options I've used in similar situations.

  1. Use an existing transcriptome of a well annotated species. Zebrafish, for example, has a lot of resources available. However, the larger the evolutionary distance, the more spurious your results can become.
  2. If you have good enough RNAseq data, you can try using stringtie in de novo mode (some docs here), which you can then use as input for transcriptome. However, there will likely need to be some strict quality thresholds for the annotations. I tend to find de novo annotation pipelines are quite noisy.

Otherwise, the analysis would be the same as with other species. Downstream analyses like GO and KEGG are a little harder, but if you have the time you can annotate the transcriptome yourself. I've also seen the use of orthologs of a better annotated species for these kinds of downstream analyses in the literature.

ADD COMMENT
0
Entering edit mode

Thanks for answering. I will probably try de novo annotation.

ADD REPLY

Login before adding your answer.

Traffic: 1331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6