Hello everyone, I am running structural annotation to my species with Ginger. I encountered an error was that my RNA-seq data was really large and caused some problems to oases and trinity. It is large because it combined like around 30 pair-end files, some of which came from our collaborators, and most of them came from NCBI. The original Read 1 file is around 200 GB. Ginger seems to only take only one set of combine fastq file, so I cannot run it separately. Should I perform duplicate removal? And any recommended modules or program?
Ginger https://github.com/i10labtitech/GINGER
error codes retrieved from nextflow (Ginger design its pipeline on it)
Nov-04 14:06:33.072 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 3; name: oases; status: COMPLETED; exit: 1; error: -; workDir: /output/marine_tilapia/work/cf/f61a216236dba2edda73721bb5c089] Nov-04 14:06:33.348 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'oases'
Caused by:Process
oases
terminated with an error exit status (1)Command executed:
/root/anaconda3/bin/velveth ginger 31 -fastq -short -separate combined_1.trim.fastq combined_2.trim.fastq /root/anaconda3/bin/velvetg ginger -read_trkg yes /root/anaconda3/bin/oases ginger
Command exit status: 1
Command output: [14469.334582] === Sequences loaded in 2814.502856 s [14469.334654] Done inputting sequences [14469.334659] Destroying splay table [14518.165976] Splay table destroyed [0.000000] Reading roadmap file ginger/Roadmaps
Command error: velvetg: Can't calloc 18446744072378670449 Annotations totalling 18446744047091928276 bytes: Cannot allocate memory [0.000000] Reading roadmap file ginger/Roadmaps
Can you try randomly sampling like 20% of the 200Gb R1 and R2 files? You can repeat this a few times to see how much the permutations are affecting results, but that is a lot of reads!
Also, how are you treating each set of paired end reads? Are they all from the same sample, or are you handing them as individual samples?
No, they are from different tissues and development stage, like gills, brain, etc, 27 samples (54 files because of pair-end) in total. I handle them as different samples. I applied program cutadapt to them, and it did adaptor removal, remove <5 bp reads, and first 10 bp after receiving 6 samples from colleagues as well as the others downloaded from NCBI SRA. I ask this because I guess there is no need to care about the duplicate in the process of genome annotation because I don't care one gene express more than other gene or not, but I am not 100% sure, and some of the samples are deeply sequenced (over 30x).
Except for duplicate removal, I have another idea is to split the original file and annotate with another RNA program, and then incorporate into Ginger, but it takes time to explore how to use them.
Should I give the program cause problems that the author use in Ginger? They were from configure file:
/** RNA-Seq denovo based **/