Question

Assembling a Transcriptome using Trinity with multiple Illumina Fastq files

0

Entering edit mode

7.4 years ago

Bioinformatics_finn • 0

Hey y'all,

I am an undergraduate biology major and have just started exploring the field of bioinformatics. I have a few questions about how I should go about running Trinity when I have multiple FASTQ files from each sample. Should I concatenate all of the forward files together and all of the reverse read files together, so that I'm left with one forward and reverse? Is it better to just run them all at once in Trinity like in the following code: Trinity --seqType fq --max_memory 50G --left 2016BZ017_S11_L001.cleaned.1.fastq,2016BZ017_S11_L002.cleaned.1.fastq,2016BZ017_S2_L001.cleaned.1.fastq,2016BZ017_S2_L002.cleaned.1.fastq --right 2016BZ017_S11_L001.cleaned.2.fastq,2016BZ017_S11_L002.cleaned.2.fastq,2016BZ017_S2_L001.cleaned.2.fastq,2016BZ017_S2_L002.cleaned.2.fastq --CPU 6 or Trinity --seqType fq --max_memory 50G --left 2016BZ017_1_concatenated.fastq --right 2016BZ017_2_concatenated.fastq --CPU 6

My sample folder contains the following fastq files2016BZ017_S11_L001.cleaned.1.fastq, 2016BZ017_S11_L001.cleaned.2.fastq, 2016BZ017_S11_L002.cleaned.1.fastq, 2016BZ017_S11_L002.cleaned.2.fastq, 2016BZ017_S2_L001.cleaned.1.fastq, 2016BZ017_S2_L001.cleaned.2.fastq, 2016BZ017_S2_L002.cleaned.1.fastq, 2016BZ017_S2_L002.cleaned.2.fastq

Also: Would y'all suggest trimming the illumina adapters using the trimmomatic option within Trinity or doing it standalone?

Thank you all so much

RNA-Seq Trinity Illumina Transcriptome • 4.0k views

ADD COMMENT • link updated 7.4 years ago by Physalia-courses ★ 2.6k • written 7.4 years ago by Bioinformatics_finn • 0

1

Entering edit mode

Should I concatenate all of the forward files together and all of the reverse read files together, so that I'm left with one forward and reverse?

I would recommend doing this. You want to create a single assembly across all samples. If you're doing DGE downstream, you can specify replicates later.

ADD REPLY • link 7.4 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

Would you suggest removing the header of the fastq files that are being concatenated?

ADD REPLY • link 7.4 years ago by Bioinformatics_finn • 0

1

Entering edit mode

No these information are needed to pair and assemble the sequences.

ADD REPLY • link 7.4 years ago by st.ph.n ★ 2.7k

score 2 · Accepted Answer · 2017-07-11

2

Entering edit mode

7.4 years ago

h.mon 35k

Concatenating and passing them as --left 1.fq --right 2.fq, or passing lists of several files should not make a difference for assembly: when you pass several files - as in your first command line - Trinity will convert to fasta and concatenate them prior to assembly. Not concatenating will save disk space, at least temporarily.

ADD COMMENT • link 7.4 years ago by h.mon 35k

1

Entering edit mode

Not concatenating will save disk space, at least temporarily

If the OP has high coverage, one can use the in-silico read normalization parameter as well.

ADD REPLY • link 7.4 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

Latest versions of Trinity - starting from 2.3.2 - have digital normalization on by default.

ADD REPLY • link 7.4 years ago by h.mon 35k

0

Entering edit mode

Thanks, I was unaware.

ADD REPLY • link 7.4 years ago by st.ph.n ★ 2.7k

score 2 · Accepted Answer · 2017-07-12

2

Entering edit mode

7.4 years ago

Physalia-courses ★ 2.6k

Hi, I would suggest you to post this question here: https://groups.google.com/forum/#!forum/trinityrnaseq-users The Trinity google forum is very active and you will always find @BrianHaas there.

Hope this helps.

ADD COMMENT • link 7.4 years ago by Physalia-courses ★ 2.6k