Question

combine trinity outputs

0

Entering edit mode

4.6 years ago

jrenart47 • 0

Hi all, I have had to split fastq files (downloaded with fasterq-dump) due to RAM limitations in my system in order to run Trinity for RNA-seq analysis from a non-model organism (elephant shark). After running Trinity I have four ouputs. In order to continue with differential gene expression analysis, I am not sure what to do: 1) run DGE from each of the different outputs, or 2) concatenate them together first. In any of these possibilities it is not clear to me if the final results would be correct. I would appreciate any advice on this issue.

RNA-Seq • 1.8k views

ADD COMMENT • link updated 4.6 years ago by wangmingcheng1992 • 0 • written 4.6 years ago by jrenart47 • 0

0

Entering edit mode

Did you try to run Trinity with in silico normalization to 50x coverage using all reads? I think, this would be the best solution if you don't have enough RAM.

ADD REPLY • link 4.6 years ago by shelkmike ★ 1.6k

0

Entering edit mode

Thank you for the advise, Shelkmike! I'll try to run it this way. Jaime

ADD REPLY • link 4.6 years ago by jrenart47 • 0

Ram · Answer 1 · 2020-12-27

All samples should be used to assembly one result, my workflow:

1:

Trinity --seqType fq --max_memory 20G --samples_file sample_file.config  --genome_guided_bam ref_sorted.bam --genome_guided_max_intron 10000 --CPU 60

Find assembled transcripts as: trinity_out_dir/Trinity-GG.fasta

sample_file.config is a txt file; suppose have 4 samples and everyone has 3 replicates, sample_file.config like below:

sample1    sample1-rep1    sample1-rep1_1.fq    sample1-rep1_2.fq
sample1    sample1-rep2    sample1-rep2_1.fq    sample1-rep2_2.fq
sample1    sample1-rep3    sample1-rep3_1.fq    sample1-rep3_2.fq
sample2    sample2-rep1    sample2-rep1_1.fq    sample2-rep1_2.fq
sample2    sample2-rep2    sample2-rep2_1.fq    sample2-rep2_2.fq
sample2    sample2-rep3    sample2-rep3_1.fq    sample2-rep3_2.fq
sample3    sample3-rep1    sample3-rep1_1.fq    sample3-rep1_2.fq
sample3    sample3-rep2    sample3-rep2_1.fq    sample3-rep2_2.fq
sample3    sample3-rep3    sample3-rep3_1.fq    sample3-rep3_2.fq
sample4    sample4-rep1    sample4-rep1_1.fq    sample4-rep1_2.fq
sample4    sample4-rep2    sample4-rep2_1.fq    sample4-rep2_2.fq
sample4    sample4-rep3    sample4-rep3_1.fq    sample4-rep3_2.fq

before run trinity assembly, align all the fq files to the elephant shark geome(hisat), then merge(samtools) all the sam format result to "ref_sorted.bam";

2: Transcript Quantification with salmon

trinityrnaseq-v2.11.0/util/align_and_estimate_abundance.pl \
    --transcripts ./trinity_out_dir/Trinity-GG.fasta \
    --seqType fq \
    --samples_file sample_file.config \
    --output_dir salmon_transcript_quantification \
    --aln_method bowtie2 \
    --thread_count 60 \
    --est_method salmon \
    --trinity_mode --prep_reference

3: DE analyse with DESeq2

trinityrnaseq-v2.11.0/Analysis/DifferentialExpression/run_DE_analysis.pl \
    --matrix salmon_transcript_quantification/salmon.gene.counts.matrix \
    --method DESeq2 \
    --samples_file sample_file.config \
    --contrasts contrasts.file \
    --output Differential_Expression_Analysis

contrasts.file like below(suppose sample1 vs sample2, sample3 vs sample4 for DE analyse):

sample1    sample2
sample3    sample4

reference: https://github.com/trinityrnaseq/trinityrnaseq/wiki