Question

Differential Expression Using Trinity Transcripts

4

Entering edit mode

12.8 years ago

Linda ▴ 160

I am trying to perform Differential expression using transcripts from Trinity and the edgeR package. I am essentially using the steps on http://trinityrnaseq.sourceforge.net/analysis/diff_expression_analysis.html this page. Everything goes fine until I try edgeR itself.

23:22 linda SO_564$ ~/software/trinityrnaseq_r2012-01-25/Analysis/DifferentialExpression/run_EdgeR.pl --matrix all.counts.matrix --transcripts AP_treated/Run33/trinity_out_dir/Trinity.fasta
CMD: R --vanilla -q < __tmp_runTMM.R
> source("/home/linda/software/trinityrnaseq_r2012-01-25/Analysis/DifferentialExpression/R/edgeR_funcs.R")
> myDGEList = target_list_file_to_DGEList("all_data_files.list")
Error in quantile.default(x, p = q) : 
  missing values and NaN's not allowed if 'na.rm' is FALSE
Calls: target_list_file_to_DGEList ... .calcFactorQuantile -> apply -> FUN -> quantile -> quantile.default
Execution halted
Error, cmd: R --vanilla -q < __tmp_runTMM.R died with ret (256)  at /home/linda/software/trinityrnaseq_r2012-01-25/Analysis/DifferentialExpression/run_EdgeR.pl line 250.

I am not sure what is the issue here. I followed the previous steps as they were listed. Any help would be much appreciated.

ETA: It appears that the issue may be missing values in the matrix file (NA), so I converted all NA to 0.00 and that produced another error.

CMD: R --vanilla -q < __tmp_runTMM.R
> source("/home/linda/software/trinityrnaseq_r2012-01-25/Analysis/DifferentialExpression/R/edgeR_funcs.R")
> myDGEList = target_list_file_to_DGEList("all_data_files.list")
> myDGEList$samples$eff.lib.size = myDGEList$samples$lib.size * myDGEList$samples$norm.factors
> write.table(myDGEList$samples, file="TMM_info.txt", quote=F, sep="\t", row.names=F)
>
Error, no seq length for comp4523_c0_seq2 at /home/linda/software/trinityrnaseq_r2012-01-25/Analysis/DifferentialExpression/run_EdgeR.pl line 286, <$fh> line 9.

Now, I wonder if this is possibly because some transcripts are missing from one file.

trinity differential edger rna • 6.3k views

ADD COMMENT • link updated 12.8 years ago by Michael 55k • written 12.8 years ago by Linda ▴ 160

0

Entering edit mode

Why would the source (i.e. trinity) of the transcripts matter at all? Transcript descriptions are transcript descriptions regardless of source, and edgeR evaluates counts on transcript descriptions. Is there something special about Trinity based transcripts in relation to edgeR?

ADD REPLY • link 12.8 years ago by seidel 11k

0

Entering edit mode

I am not sure about that. I suppose the length of the transcripts comes into play since looking at the code it seems to use the Trinity.fasta file to calculate the transcript length only.

ADD REPLY • link 12.8 years ago by Linda ▴ 160

0

Entering edit mode

Can you post the perl script? Now, it looks like something is still missing, and what is comp4523_c0_seq2? Open the file TMM_info.txt and search for the sequence name, now look for what is different in that line from others.

ADD REPLY • link 12.8 years ago by Michael 55k

0

Entering edit mode

I mean, a link to the perl script ofc.

ADD REPLY • link 12.8 years ago by Michael 55k

0

Entering edit mode

We really cant help you like that, you need to provide more information, or nag the person who is responsible for the script, I mean how should I debug a software if you don't give me the code?

ADD REPLY • link 12.8 years ago by Michael 55k

0

Entering edit mode

Michael, thanks for taking the time to solve this. The script is from the Analysis directory in http://sourceforge.net/projects/trinityrnaseq/ Apologies for taking long to respond.

ADD REPLY • link 12.8 years ago by Linda ▴ 160

0

Entering edit mode

Hi Linda,

I am now having the same error when generating my gene/transcript-by-expression matrices - did you ever come to a solution? Is it just the case that some transcripts are missing in different samples and they are not processed? The matrices still seem to be generating a lot of output.

Thanks

ADD REPLY • link 9.1 years ago by AForza • 0

score 2 · Answer 1 · 2012-02-04

2

Entering edit mode

12.8 years ago

Michael 55k

Looks like there are missing values in your data and cannot be handled by the package,because it tries to call quantiles without na.rm=TRUE. Can you check for missing values in your data? If these are counts based data you might be able to replace missing values with 0 counts. but this can be only a guess, because I don't know in detail what your analysis does and how your data look like. If you would send me your data, I could have a look for you though :D

ADD COMMENT • link 12.8 years ago by Michael 55k

0

Entering edit mode

I tried a putting missing NA values to 0.00 and that produced another error. Please check the edited question. Thanks for your help.

ADD REPLY • link 12.8 years ago by Linda ▴ 160