Question

Warning message: In y/gene.length.kb : longer object length is not a multiple of shorter object length? Any ideas?

1

Entering edit mode

9.3 years ago

tud55122 ▴ 20

Hi,

I'm new to RNA-seq analysis. I'm using EdgeR to generate RPKMs.Everything works fine but the last step, there is a warning message saying that:

Warning message:

In y/gene.length.kb :
  longer object length is not a multiple of shorter object length

When I checked the RPKMs generated, the values are kinda of skewed. There are high variations even after the same conditions but the raw counts look fine.

Any idea? Thanks, Hang

RNA-Seq RPKM EdgeR • 4.3k views

ADD COMMENT • link updated 9.3 years ago by Michael 56k • written 9.3 years ago by tud55122 ▴ 20

0

Entering edit mode

Thanks for your reply, guys. Do you know how to fix the problem?

Here are the scripts I used

d = DGEList(counts=counts, group=samples$condition)
d = calcNormFactors(d)
length.genes=read.table("gene_length_mouse.txt",sep="\t",header=T)
rpkm.gene=rpkm(d, length.genes[length.genes$Gene %in% rownames(d),2],normalized.lib.sizes=TRUE, log=F)

Thanks, Hang

ADD REPLY • link updated 9.3 years ago by Michael 56k • written 9.3 years ago by tud55122 ▴ 20

0

Entering edit mode

Looks like d contains some rows not in your text file. Where you subset length.genes[] with the %in% command, you need to also subset d with the converse. You can only get RPKM for genes where you have the length in the text file. And for that matter, be extra careful the two lists are sorted the same! Maybe make them unified in another command and dont use the %in% command inside the rpkm function.

ADD REPLY • link 9.3 years ago by karl.stamm 4.1k

score 2 · Answer 1 · 2016-04-28

2

Entering edit mode

9.3 years ago

Shab86 ▴ 310

Could it be that you are providing different number of gene lengths than there are genes in your matrix?

ADD COMMENT • link 9.3 years ago by Shab86 ▴ 310

0

Entering edit mode

That's just what the error says. The "y" vector is a different length than the "gene.length.kb". It talks about multiples, because in R you're allowed to divide a vector of different lengths, because the shorter one is recycled, like (1,2,3,4,5,6) / (1,2) = (1/1,2/2,3/1,4/2,5/1,6/2)

ADD REPLY • link 9.3 years ago by karl.stamm 4.1k

0

Entering edit mode

Thanks for your reply, guys. Do you know how to fix the problem?

Here are the scripts I used d = DGEList(counts=counts, group=samples$condition) d = calcNormFactors(d) length.genes=read.table("gene_length_mouse.txt",sep="\t",header=T) rpkm.gene=rpkm(d, length.genes[length.genes$Gene %in% rownames(d),2],normalized.lib.sizes=TRUE, log=F)

ADD REPLY • link 9.3 years ago by tud55122 ▴ 20

score 0 · Answer 2 · 2016-04-29

The problem is this:

length.genes$Gene %in% rownames(d)

and the file "gene_length_mouse.txt".

If some genes in the gene length file cannot be matched, then you end up with an unusable length vector. In addition I would like to stress that it might be better to have now gene lengths than bad ones. Gene lengths are rather a confounding factor, the combined exon lengths might be better to use here.