I am working on a project and I have two paired-end reads ending in .fq. My question is, I wish to assemble the transcripts and to begin that I tried to use Trinity. But I don’t feel like I am on the right track, any suggestions would very munch appreciated!!
If you are working on a cluster then it would be best to alert the sys admins of the fact that install of trinity is missing a program needed. They should be able to fix this easily.
While the conda option mentioned by @Michael will generally work, it may fail if your home directory is not available on cluster nodes.
The reads that I have in my fastq file as I have previously mentioned are paired-end reads from a previously unsequenced transcriptome animal species with read length 80 and insert size 300.
Would these two files require and prior modification before running it through Trinity?
I think you are on the right track, running long running multi-parameter scripts directly from the commandline can be cumbersome and after a while you never know which command actually succeeded. A hint to make your life easier: make a simple shell script wrapper for the trinity call. That way, you also document the parameters used. A step you have to do before is some quality control to check if you need to trim your reads. I should maybe stress that running a QC process is the point to start from, not running the assembly outright. Run fastQC on your files first to see if the quality is high and whether there is adapter contamination.
Here is a simple command to start running Trinity from, without using trimming
#!/bin/sh
set -eu
TRINITYCMD=Trinity # adapt this if Trinity is not in your path
# you need to adapt max_memory and CPU to your server
$TRINITYCMD --seqType fq --max_memory 500G --left $1 --right $2 --CPU 60 --output Trinity-1
# change the output directory each time you run something fundamental, like adding trimmomatic option
Save the script as run_trinity.sh, make it executable and run it as:
nohup ./run_trinity.sh file1.fq file2.fq &
That way, the script will not be interrupted when you log out. Debug output is in nohup.out.
Caveat: In principle, I do agree with genomax that a competent sysadmin should install software, and that is the best way for you to get installed software. However, that habit of having sysadmins install required packages has seemingly grown out of fashion and everyone is maintaining their own mess in their home directories.
If you are working on a cluster then it would be best to alert the sys admins of the fact that install of
trinity
is missing a program needed. They should be able to fix this easily.While the
conda
option mentioned by @Michael will generally work, it may fail if your home directory is not available on cluster nodes.Hi @genomax,
The reads that I have in my fastq file as I have previously mentioned are paired-end reads from a previously unsequenced transcriptome animal species with read length 80 and insert size 300.
Would these two files require and prior modification before running it through Trinity?