Hey y'all,
I am an undergraduate biology major and have just started exploring the field of bioinformatics. I have a few questions about how I should go about running Trinity when I have multiple FASTQ files from each sample. Should I concatenate all of the forward files together and all of the reverse read files together, so that I'm left with one forward and reverse? Is it better to just run them all at once in Trinity like in the following code:
Trinity --seqType fq --max_memory 50G --left 2016BZ017_S11_L001.cleaned.1.fastq,2016BZ017_S11_L002.cleaned.1.fastq,2016BZ017_S2_L001.cleaned.1.fastq,2016BZ017_S2_L002.cleaned.1.fastq --right 2016BZ017_S11_L001.cleaned.2.fastq,2016BZ017_S11_L002.cleaned.2.fastq,2016BZ017_S2_L001.cleaned.2.fastq,2016BZ017_S2_L002.cleaned.2.fastq --CPU 6
or
Trinity --seqType fq --max_memory 50G
--left 2016BZ017_1_concatenated.fastq --right 2016BZ017_2_concatenated.fastq --CPU 6
My sample folder contains the following fastq files2016BZ017_S11_L001.cleaned.1.fastq, 2016BZ017_S11_L001.cleaned.2.fastq, 2016BZ017_S11_L002.cleaned.1.fastq, 2016BZ017_S11_L002.cleaned.2.fastq, 2016BZ017_S2_L001.cleaned.1.fastq, 2016BZ017_S2_L001.cleaned.2.fastq, 2016BZ017_S2_L002.cleaned.1.fastq, 2016BZ017_S2_L002.cleaned.2.fastq
Also: Would y'all suggest trimming the illumina adapters using the trimmomatic option within Trinity or doing it standalone?
Thank you all so much
I would recommend doing this. You want to create a single assembly across all samples. If you're doing DGE downstream, you can specify replicates later.
Would you suggest removing the header of the fastq files that are being concatenated?
No these information are needed to pair and assemble the sequences.