I am trying to perform Differential expression using transcripts from Trinity and the edgeR package. I am essentially using the steps on http://trinityrnaseq.sourceforge.net/analysis/diff_expression_analysis.html this page. Everything goes fine until I try edgeR itself.
23:22 linda SO_564$ ~/software/trinityrnaseq_r2012-01-25/Analysis/DifferentialExpression/run_EdgeR.pl --matrix all.counts.matrix --transcripts AP_treated/Run33/trinity_out_dir/Trinity.fasta
CMD: R --vanilla -q < __tmp_runTMM.R
> source("/home/linda/software/trinityrnaseq_r2012-01-25/Analysis/DifferentialExpression/R/edgeR_funcs.R")
> myDGEList = target_list_file_to_DGEList("all_data_files.list")
Error in quantile.default(x, p = q) :
missing values and NaN's not allowed if 'na.rm' is FALSE
Calls: target_list_file_to_DGEList ... .calcFactorQuantile -> apply -> FUN -> quantile -> quantile.default
Execution halted
Error, cmd: R --vanilla -q < __tmp_runTMM.R died with ret (256) at /home/linda/software/trinityrnaseq_r2012-01-25/Analysis/DifferentialExpression/run_EdgeR.pl line 250.
I am not sure what is the issue here. I followed the previous steps as they were listed. Any help would be much appreciated.
ETA: It appears that the issue may be missing values in the matrix file (NA), so I converted all NA
to 0.00 and that produced another error.
CMD: R --vanilla -q < __tmp_runTMM.R
> source("/home/linda/software/trinityrnaseq_r2012-01-25/Analysis/DifferentialExpression/R/edgeR_funcs.R")
> myDGEList = target_list_file_to_DGEList("all_data_files.list")
> myDGEList$samples$eff.lib.size = myDGEList$samples$lib.size * myDGEList$samples$norm.factors
> write.table(myDGEList$samples, file="TMM_info.txt", quote=F, sep="\t", row.names=F)
>
Error, no seq length for comp4523_c0_seq2 at /home/linda/software/trinityrnaseq_r2012-01-25/Analysis/DifferentialExpression/run_EdgeR.pl line 286, <$fh> line 9.
Now, I wonder if this is possibly because some transcripts are missing from one file.
Why would the source (i.e. trinity) of the transcripts matter at all? Transcript descriptions are transcript descriptions regardless of source, and edgeR evaluates counts on transcript descriptions. Is there something special about Trinity based transcripts in relation to edgeR?
I am not sure about that. I suppose the length of the transcripts comes into play since looking at the code it seems to use the Trinity.fasta file to calculate the transcript length only.
Can you post the perl script? Now, it looks like something is still missing, and what is comp4523_c0_seq2? Open the file TMM_info.txt and search for the sequence name, now look for what is different in that line from others.
I mean, a link to the perl script ofc.
We really cant help you like that, you need to provide more information, or nag the person who is responsible for the script, I mean how should I debug a software if you don't give me the code?
Michael, thanks for taking the time to solve this. The script is from the Analysis directory in http://sourceforge.net/projects/trinityrnaseq/ Apologies for taking long to respond.
Hi Linda,
I am now having the same error when generating my gene/transcript-by-expression matrices - did you ever come to a solution? Is it just the case that some transcripts are missing in different samples and they are not processed? The matrices still seem to be generating a lot of output.
Thanks